Exploring Single-Cell Data with Deep Multitasking Neural Networks, bioRxiv, 2017-12-20

AbstractBiomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation.A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.

biorxiv bioinformatics 0-100-users 2017

Long-read sequencing of nascent RNA reveals coupling among RNA processing events, bioRxiv, 2017-12-19

AbstractPre-mRNA splicing is accomplished by the spliceosome, a megadalton complex that assembles de novo on each intron. Because spliceosome assembly and catalysis occur co-transcriptionally, we hypothesized that introns are removed in the order of their transcription in genomes dominated by constitutive splicing. Remarkably little is known about splicing order and the regulatory potential of nascent transcript remodeling by splicing, due to the limitations of existing methods that focus on analysis of mature splicing products (mRNAs) rather than substrates and intermediates. Here, we overcome this obstacle through long-read RNA sequencing of nascent, multi-intron transcripts in the fission yeast Schizosaccharomyces pombe. Most multi-intron transcripts were fully spliced, consistent with rapid co-transcriptional splicing. However, an unexpectedly high proportion of transcripts were either fully spliced or fully unspliced, suggesting that splicing of any given intron is dependent on the splicing status of other introns in the transcript. Supporting this, mild inhibition of splicing by a temperature-sensitive mutation in Prp2, the homolog of vertebrate U2AF65, increased the frequency of fully unspliced transcripts. Importantly, fully unspliced transcripts displayed transcriptional read-through at the polyA site and were degraded co-transcriptionally by the nuclear exosome. Finally, we show that cellular mRNA levels were reduced in genes with a high number of unspliced nascent transcripts during caffeine treatment, showing regulatory significance of co-transcriptional splicing. Therefore, overall splicing of individual nascent transcripts, 3’ end formation, and mRNA half-life depend on the splicing status of neighboring introns, suggesting crosstalk among spliceosomes and the polyA cleavage machinery during transcription elongation.

biorxiv molecular-biology 0-100-users 2017

High throughput single cell RNA-seq of developing mouse kidney and human kidney organoids reveals a roadmap for recreating the kidney, bioRxiv, 2017-12-17

AbstractRecent advances in our capacity to differentiate human pluripotent stem cells to human kidney tissue are moving the field closer to novel approaches for renal replacement. Such protocols have relied upon our current understanding of the molecular basis of mammalian kidney morphogenesis. To date this has depended upon population based-profiling of non-homogenous cellular compartments. In order to improve our resolution of individual cell transcriptional profiles during kidney morphogenesis, we have performed 10x Chromium single cell RNA-seq on over 6000 cells from the E18.5 developing mouse kidney, as well as more than 7000 cells from human iPSC-derived kidney organoids. We identified 16 clusters of cells representing all major cell lineages in the E18.5 mouse kidney. The differentially expressed genes from individual murine clusters were then used to guide the classification of 16 cell clusters within human kidney organoids, revealing the presence of distinguishable stromal, endothelial, nephron, podocyte and nephron progenitor populations. Despite the congruence between developing mouse and human organoid, our analysis suggested limited nephron maturation and the presence of ‘off target’ populations in human kidney organoids, including unidentified stromal populations and evidence of neural clusters. This may reflect unique human kidney populations, mixed cultures or aberrant differentiation in vitro. Analysis of clusters within the mouse data revealed novel insights into progenitor maintenance and cellular maturation in the major renal lineages and will serve as a roadmap to refine directed differentiation approaches in human iPSC-derived kidney organoids.

biorxiv developmental-biology 0-100-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo