Probabilistic gene expression signatures identify cell-types from single cell RNA-seq data, bioRxiv, 2020-01-06
AbstractSingle-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the classification of cells into known cell-types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type identification methods. We find limitations with current approaches due to the reliance on known marker genes and sensitivity to the quality of reference samples. Here we present a computationally light statistical approach, based on Naive Bayes, that leverages public datasets to combine information across thousands of genes and probabilistically assign cell-type identity. Using datasets ranging across species and tissue types, we demonstrate that our approach is robust to low-quality reference data and produces more accurate cell-type identification than current methods.
biorxiv genomics 0-100-users 2020The predictive power of the microbiome exceeds that of genome-wide association studies in the discrimination of complex human disease, bioRxiv, 2020-01-02
AbstractOver the past decade, studies of the human genome and microbiome have deepened our understanding of the connections between human genes, environments, microbes, and disease. For example, the sheer number of indicators of the microbiome and human genetic common variants associated with disease has been immense, but clinical utility has been elusive. Here, we compared the predictive capabilities of the human microbiome versus human genomic common variants across 13 common diseases. We concluded that microbiomic indicators outperform human genetics in predicting host phenotype (overall Microbiome-Association-Study [MAS] area under the curve [AUC] = 0.79 [SE = 0.03], overall Genome-Wide-Association-Study [GWAS] AUC = 0.67 [SE = 0.02]). Our results, while preliminary and focused on a subset of the totality of disease, demonstrate the relative predictive ability of the microbiome, indicating that it may outperform human genetics in discriminating human disease cases and controls. They additionally motivate the need for population-level microbiome sequencing resources, akin to the UK Biobank, to further improve and reproduce metagenomic models of disease.
biorxiv bioinformatics 200-500-users 2020Binary and analog variation of synapses between cortical pyramidal neurons, bioRxiv, 2020-01-01
AbstractLearning from experience depends at least in part on changes in neuronal connections. We present the largest map of connectivity to date between cortical neurons of a defined type (L23 pyramidal cells), which was enabled by automated analysis of serial section electron microscopy images with improved handling of image defects. We used the map to identify constraints on the learning algorithms employed by the cortex. Previous cortical studies modeled a continuum of synapse sizes (Arellano et al., 2007) by a log-normal distribution (Loewenstein, Kuras and Rumpel, 2011; de Vivo et al., 2017; Santuy et al., 2018). A continuum is consistent with most neural network models of learning, in which synaptic strength is a continuously graded analog variable. Here we show that synapse size, when restricted to synapses between L23 pyramidal cells, is well-modeled by the sum of a binary variable and an analog variable drawn from a log-normal distribution. Two synapses sharing the same presynaptic and postsynaptic cells are known to be correlated in size (Sorra and Harris, 1993; Koester and Johnston, 2005; Bartol et al., 2015; Kasthuri et al., 2015; Dvorkin and Ziv, 2016; Bloss et al., 2018; Motta et al., 2019). We show that the binary variables of the two synapses are highly correlated, while the analog variables are not. Binary variation could be the outcome of a Hebbian or other synaptic plasticity rule depending on activity signals that are relatively uniform across neuronal arbors, while analog variation may be dominated by other influences. We discuss the implications for the stability-plasticity dilemma.
biorxiv neuroscience 100-200-users 2020Single cell epigenomic atlas of the developing human brain and organoids, bioRxiv, 2020-01-01
AbstractDynamic changes in chromatin accessibility coincide with important aspects of neuronal differentiation, such as fate specification and arealization and confer cell type-specific associations to neurodevelopmental disorders. However, studies of the epigenomic landscape of the developing human brain have yet to be performed at single-cell resolution. Here, we profiled chromatin accessibility of >75,000 cells from eight distinct areas of developing human forebrain using single cell ATAC-seq (scATACseq). We identified thousands of loci that undergo extensive cell type-specific changes in accessibility during corticogenesis. Chromatin state profiling also reveals novel distinctions between neural progenitor cells from different cortical areas not seen in transcriptomic profiles and suggests a role for retinoic acid signaling in cortical arealization. Comparison of the cell type-specific chromatin landscape of cerebral organoids to primary developing cortex found that organoids establish broad cell type-specific enhancer accessibility patterns similar to the developing cortex, but lack many putative regulatory elements identified in homologous primary cell types. Together, our results reveal the important contribution of chromatin state to the emerging patterns of cell type diversity and cell fate specification and provide a blueprint for evaluating the fidelity and robustness of cerebral organoids as a model for cortical development.
biorxiv developmental-biology 100-200-users 2020Species-specific developmental timing is associated with global differences in protein stability in mouse and human, bioRxiv, 2020-01-01
ABSTRACTWhat determines the pace of embryonic development? Although many molecular mechanisms controlling developmental processes are evolutionarily conserved, the speed at which these operate can vary substantially between species. For example, the same genetic programme, comprising sequential changes in transcriptional states, governs the differentiation of motor neurons in mouse and human, but the tempo at which it operates differs between species. Using in vitro directed differentiation of embryonic stem cells to motor neurons, we show that the programme runs twice as fast in mouse as in human. We provide evidence that this is neither due to differences in signalling, nor the genomic sequence of genes or their regulatory elements. Instead, we find an approximately two-fold increase in protein stability and cell cycle duration in human cells compared to mouse. This can account for the slower pace of human development, indicating that global differences in key kinetic parameters play a major role in interspecies differences in developmental tempo.
biorxiv developmental-biology 200-500-users 2020Twelve Platinum-Standard Reference Genomes Sequences (PSRefSeq) that complete the full range of genetic diversity of Asian rice, bioRxiv, 2020-01-01
AbstractAs the human population grows from 7.8 billion to 10 billion over the next 30 years, breeders must do everything possible to create crops that are highly productive and nutritious, while simultaneously having less of an environmental footprint. Rice will play a critical role in meeting this demand and thus, knowledge of the full repertoire of genetic diversity that exists in germplasm banks across the globe is required. To meet this demand, we describe the generation, validation and preliminary analyses of transposable element and long-range structural variation content of 12 near-gap-free reference genome sequences (RefSeqs) from representatives of 12 of 15 subpopulations of cultivated rice. When combined with 4 existing RefSeqs, that represent the 3 remaining rice subpopulations and the largest admixed population, this collection of 16 Platinum Standard RefSeqs (PSRefSeq) can be used as a pan-genome template to map resequencing data to detect virtually all standing natural variation that exists in the pan-cultivated rice genome.
biorxiv genomics 0-100-users 2020