Single cell RNA-seq denoising using a deep count autoencoder, bioRxiv, 2018-04-14
AbstractSingle-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNAseq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured. Our method scales linearly with the number of cells and can therefore be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
biorxiv bioinformatics 200-500-users 2018Direct RNA Sequencing of the Complete Influenza A Virus Genome, bioRxiv, 2018-04-12
ABSTRACTFor the first time, a complete genome of an RNA virus has been sequenced in its original form. Previously, RNA was sequenced by the chemical degradation of radiolabelled RNA, a difficult method that produced only short sequences. Instead, RNA has usually been sequenced indirectly by copying it into cDNA, which is often amplified to dsDNA by PCR and subsequently analyzed using a variety of DNA sequencing methods. We designed an adapter to short highly conserved termini of the influenza virus genome to target the (-) sense RNA into a protein nanopore on the Oxford Nanopore MinION sequencing platform. Utilizing this method and total RNA extracted from the allantoic fluid of infected chicken eggs, we demonstrate successful sequencing of the complete influenza virus genome with 100% nucleotide coverage, 99% consensus identity, and 99% of reads mapped to influenza. By utilizing the same methodology we can redesign the adapter in order to expand the targets to include viral mRNA and (+) sense cRNA, which are essential to the viral life cycle. This has the potential to identify and quantify splice variants and base modifications, which are not practically measurable with current methods.
biorxiv genomics 100-200-users 2018Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats, bioRxiv, 2018-04-12
AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.
biorxiv genomics 0-100-users 2018Parameterizing neural power spectra, bioRxiv, 2018-04-11
AbstractElectrophysiological signals across species and recording scales exhibit both periodic and aperiodic features. Periodic oscillations have been widely studied and linked to numerous physiological, cognitive, behavioral, and disease states, while the aperiodic “background” 1f component of neural power spectra has received far less attention. Most analyses of oscillations are conducted on a priori, canonically-defined frequency bands without consideration of the underlying aperiodic structure, or verification that a periodic signal even exists in addition to the aperiodic signal. This is problematic, as recent evidence shows that the aperiodic signal is dynamic, changing with age, task demands, and cognitive state. It has also been linked to the relative excitationinhibition of the underlying neuronal population. This means that standard analytic approaches easily conflate changes in the periodic and aperiodic signals with one another because the aperiodic parameters—along with oscillation center frequency, power, and bandwidth—are all dynamic in physiologically meaningful, but likely different, ways. In order to overcome the limitations of traditional narrowband analyses and to reduce the potentially deleterious effects of conflating these features, we introduce a novel algorithm for automatic parameterization of neural power spectral densities (PSDs) as a combination of the aperiodic signal and putative periodic oscillations. Notably, this algorithm requires no a priori specification of band limits and accounts for potentially-overlapping oscillations while minimizing the degree to which they are confounded with one another. This algorithm is amenable to large-scale data exploration and analysis, providing researchers with a tool to quickly and accurately parameterize neural power spectra.
biorxiv neuroscience 200-500-users 2018A Single-Cell Atlas of Cell Types, States, and Other Transcriptional Patterns from Nine Regions of the Adult Mouse Brain, bioRxiv, 2018-04-10
The mammalian brain is composed of diverse, specialized cell populations, few of which we fully understand. To more systematically ascertain and learn from cellular specializations in the brain, we used Drop-seq to perform single-cell RNA sequencing of 690,000 cells sampled from nine regions of the adult mouse brain frontal and posterior cortex (156,000 and 99,000 cells, respectively), hippocampus (113,000), thalamus (89,000), cerebellum (26,000), and all of the basal ganglia – the striatum (77,000), globus pallidus externusnucleus basalis (66,000), entopeduncularsubthalamic nuclei (19,000), and the substantia nigraventral tegmental area (44,000). We developed computational approaches to distinguish biological from technical signals in single-cell data, then identified 565 transcriptionally distinct groups of cells, which we annotate and present through interactive online software we developed for visualizing and re-analyzing these data (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpdropviz.org>DropViz<jatsext-link>). Comparison of cell classes and types across regions revealed features of brain organization. These included a neuronal gene-expression module for synthesizing axonal and presynaptic components; widely shared patterns in the combinatorial co-deployment of voltage-gated ion channels by diverse neuronal populations; functional distinctions among cells of the brain vasculature; and specialization of glutamatergic neurons across cortical regions to a degree not observed in other neuronal or non-neuronal populations. We describe systematic neuronal classifications for two complex, understudied regions of the basal ganglia, the globus pallidus externus and substantia nigra reticulata. In the striatum, where neuron types have been intensely researched, our data reveal a previously undescribed population of striatal spiny projection neurons (SPNs) comprising 4% of SPNs. The adult mouse brain cell atlas can serve as a reference for analyses of development, disease, and evolution.
biorxiv neuroscience 200-500-users 2018Evaluation of UMAP as an alternative to t-SNE for single-cell data, bioRxiv, 2018-04-10
AbstractUniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.
biorxiv bioinformatics 100-200-users 2018