Exploring Single-Cell Data with Deep Multitasking Neural Networks, bioRxiv, 2017-12-20
AbstractBiomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation.A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.
biorxiv bioinformatics 0-100-users 2017Searching for the causal effects of BMI in over 300 000 individuals, using Mendelian randomization, bioRxiv, 2017-12-20
ABSTRACTMendelian randomization (MR) has been used to estimate the causal effect of body mass index (BMI) on particular traits thought to be affected by BMI. However, BMI may also be a modifiable, causal risk factor for outcomes where there is no prior reason to suggest that a causal effect exists. We perform a MR phenome-wide association study (MR-pheWAS) to search for the causal effects of BMI in UK Biobank (n=334 968), using the PHESANT open-source phenome scan tool. Of the 20 461 tests performed, our MR-pheWAS identified 519 associations below a stringent P value threshold corresponding to a 5% estimated false discovery rate, including many previously identified causal effects. We also identified several novel effects, including protective effects of higher BMI on a set of psychosocial traits, identified initially in our preliminary MR-pheWAS and replicated in an independent subset of UK Biobank. Such associations need replicating in an independent sample.
biorxiv epidemiology 0-100-users 2017Social interactions impact on the dopaminergic system and drive individuality, bioRxiv, 2017-12-20
SummaryIndividuality is a ubiquitous and well-conserved feature among animal species. The behavioral patterns of individual animals affect their respective role in the ecosystem and their prospects for survival. Even though some of the factors shaping individuality have been identified, the mechanisms underlying individuation are poorly understood and are generally considered to be genetics-based. Here we devised a large environment where mice live continuously, and observed that individuality, measured by both social and individual traits, emerged and settled within the group. Midbrain dopamine neurons underwent neurophysiological adaptations that mirrored this phenotypic divergence in individual behaviors. Strikingly, modifying the social environment resulted in a fast re-adaptation of both the animal’s personality and its dopaminergic signature. These results indicate that individuality can rapidly evolve upon social challenges, and does not just depend on the genetic or epigenetic initial status of the animal.
biorxiv neuroscience 0-100-users 2017Correction of the Framingham Risk Score Data Reported in SPRINT, bioRxiv, 2017-12-19
This report describes an error in the Framingham Risk Score data presented in the original SPRINT publication.1 The data, presented in Table 1 of the main SPRINT publication in the New England Journal of Medicine and made available to SPRINT Challenge participants, incorrectly calculated the level of baseline cardiovascular risk of the study participants using the Framingham Risk Score. The correct calculation increased the number of participants identified as having >15% 10-year risk from 5737 to 7089, a change from 61% to 76% of the total study population. This information is important for researchers attempting to validate and extend the trial’s findings and is particularly germane because the recently released American Heart AssociationAmerican College of Cardiology blood pressure guidelines changed blood pressure targets for pharmacologic therapy only for high-risk individuals.
biorxiv clinical-trials 0-100-users 2017Long-read sequencing of nascent RNA reveals coupling among RNA processing events, bioRxiv, 2017-12-19
AbstractPre-mRNA splicing is accomplished by the spliceosome, a megadalton complex that assembles de novo on each intron. Because spliceosome assembly and catalysis occur co-transcriptionally, we hypothesized that introns are removed in the order of their transcription in genomes dominated by constitutive splicing. Remarkably little is known about splicing order and the regulatory potential of nascent transcript remodeling by splicing, due to the limitations of existing methods that focus on analysis of mature splicing products (mRNAs) rather than substrates and intermediates. Here, we overcome this obstacle through long-read RNA sequencing of nascent, multi-intron transcripts in the fission yeast Schizosaccharomyces pombe. Most multi-intron transcripts were fully spliced, consistent with rapid co-transcriptional splicing. However, an unexpectedly high proportion of transcripts were either fully spliced or fully unspliced, suggesting that splicing of any given intron is dependent on the splicing status of other introns in the transcript. Supporting this, mild inhibition of splicing by a temperature-sensitive mutation in Prp2, the homolog of vertebrate U2AF65, increased the frequency of fully unspliced transcripts. Importantly, fully unspliced transcripts displayed transcriptional read-through at the polyA site and were degraded co-transcriptionally by the nuclear exosome. Finally, we show that cellular mRNA levels were reduced in genes with a high number of unspliced nascent transcripts during caffeine treatment, showing regulatory significance of co-transcriptional splicing. Therefore, overall splicing of individual nascent transcripts, 3’ end formation, and mRNA half-life depend on the splicing status of neighboring introns, suggesting crosstalk among spliceosomes and the polyA cleavage machinery during transcription elongation.
biorxiv molecular-biology 0-100-users 2017High throughput single cell RNA-seq of developing mouse kidney and human kidney organoids reveals a roadmap for recreating the kidney, bioRxiv, 2017-12-17
AbstractRecent advances in our capacity to differentiate human pluripotent stem cells to human kidney tissue are moving the field closer to novel approaches for renal replacement. Such protocols have relied upon our current understanding of the molecular basis of mammalian kidney morphogenesis. To date this has depended upon population based-profiling of non-homogenous cellular compartments. In order to improve our resolution of individual cell transcriptional profiles during kidney morphogenesis, we have performed 10x Chromium single cell RNA-seq on over 6000 cells from the E18.5 developing mouse kidney, as well as more than 7000 cells from human iPSC-derived kidney organoids. We identified 16 clusters of cells representing all major cell lineages in the E18.5 mouse kidney. The differentially expressed genes from individual murine clusters were then used to guide the classification of 16 cell clusters within human kidney organoids, revealing the presence of distinguishable stromal, endothelial, nephron, podocyte and nephron progenitor populations. Despite the congruence between developing mouse and human organoid, our analysis suggested limited nephron maturation and the presence of ‘off target’ populations in human kidney organoids, including unidentified stromal populations and evidence of neural clusters. This may reflect unique human kidney populations, mixed cultures or aberrant differentiation in vitro. Analysis of clusters within the mouse data revealed novel insights into progenitor maintenance and cellular maturation in the major renal lineages and will serve as a roadmap to refine directed differentiation approaches in human iPSC-derived kidney organoids.
biorxiv developmental-biology 0-100-users 2017