Cardelino Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants, bioRxiv, 2018-09-12
AbstractDecoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comPMBiocardelino>httpsgithub.comPMBiocardelino<jatsext-link>), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.Key findings<jatslist list-type=bullet><jatslist-item>A novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes.<jatslist-item><jatslist-item>Evidence for non-neutral evolution of clonal populations in human fibroblasts.<jatslist-item><jatslist-item>Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.<jatslist-item>
biorxiv genomics 100-200-users 2018The genomic view of diversification, bioRxiv, 2018-09-11
ABSTRACTEvolutionary relationships between species are traditionally represented in the form of a tree, called the species tree. The reconstruction of the species tree from molecular data is hindered by frequent conflicts between gene genealogies. A standard way of dealing with this issue is to postulate the existence of a unique species tree where disagreements between gene trees are explained by incomplete lineage sorting (ILS) due to random coalescences of gene lineages inside the edges of the species tree. This paradigm, known as the multi-species coalescent (MSC), is constantly violated by the ubiquitous presence of gene flow revealed by empirical studies, leading to topological incongruences of gene trees that cannot be explained by ILS alone. Here we argue that this paradigm should be revised in favor of a vision acknowledging the importance of gene flow and where gene histories shape the species tree rather than the opposite. We propose a new, plastic framework for modeling the joint evolution of gene and species lineages relaxing the hierarchy between the species tree and gene trees. We implement this framework in two mathematical models called the gene-based diversification models (GBD) 1) GBD-forward, following all evolving genomes and thus very intensive computationally and 2) GBD-backward, based on coalescent theory and thus more efficient. Each model features four parameters tuning colonization, mutation, gene flow and reproductive isolation. We propose a quick inference method based on the differences between gene trees and use it to evaluate the amount of gene flow in two empirical data-sets. We find that in these data-sets, gene tree distributions are better explained by the best fitting GBD model than by the best fitting MSC model. This work should pave the way for approaches of diversification using the richer signal contained in genomic evolutionary histories rather than in the mere species tree.
biorxiv evolutionary-biology 100-200-users 2018Resource Scalable whole genome sequencing of 40,000 single cells identifies stochastic aneuploidies, genome replication states and clonal repertoires, bioRxiv, 2018-09-07
SummaryEssential features of cancer tissue cellular heterogeneity such as negatively selected genome topologies, sub-clonal mutation patterns and genome replication states can only effectively be studied by sequencing single-cell genomes at scale and high fidelity. Using an amplification-free single-cell genome sequencing approach implemented on commodity hardware (DLP+) coupled with a cloud-based computational platform, we define a resource of 40,000 single-cell genomes characterized by their genome states, across a wide range of tissue types and conditions. We show that shallow sequencing across thousands of genomes permits reconstruction of clonal genomes to single nucleotide resolution through aggregation analysis of cells sharing higher order genome structure. From large-scale population analysis over thousands of cells, we identify rare cells exhibiting mitotic mis-segregation of whole chromosomes. We observe that tissue derived scWGS libraries exhibit lower rates of whole chromosome anueploidy than cell lines, and loss of p53 results in a shift in event type, but not overall prevalence in breast epithelium. Finally, we demonstrate that the replication states of genomes can be identified, allowing the number and proportion of replicating cells, as well as the chromosomal pattern of replication to be unambiguously identified in single-cell genome sequencing experiments. The combined annotated resource and approach provide a re-implementable large scale platform for studying lineages and tissue heterogeneity.
biorxiv genomics 100-200-users 2018The geometry of abstraction in hippocampus and pre-frontal cortex, bioRxiv, 2018-09-07
The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. Here we characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.
biorxiv neuroscience 100-200-users 2018Complex cell-state changes revealed by single cell RNA sequencing of 76,149 microglia throughout the mouse lifespan and in the injured brain, bioRxiv, 2018-08-31
Microglia, the resident immune cells of the brain, rapidly change states in response to their environment, but we lack molecular and functional signatures of different microglial populations. In this study, we analyzed the RNA expression patterns of more than 76,000 individual microglia during development, old age and after brain injury. Analysis uncovered at least nine transcriptionally distinct microglial states, which expressed unique sets of genes and were localized in the brain using specific markers. The greatest microglial heterogeneity was found at young ages; however, several states - including chemokine-enriched inflammatory microglia - persisted throughout the lifespan or increased in the aged brain. Multiple reactive microglial subtypes were also found following demyelinating injury in mice, at least one of which was also found in human MS lesions. These unique microglia signatures can be used to better understand microglia function and to identify and manipulate specific subpopulations in health and disease.
biorxiv neuroscience 100-200-users 2018High-throughput single-cell transcriptome profiling of plant cell types, bioRxiv, 2018-08-29
AbstractSingle-cell transcriptome analysis of heterogeneous tissues can provide high-resolution windows into the genomic basis and spatiotemporal dynamics of developmental processes. Here we demonstrate the feasibility of high-throughput single-cell RNA sequencing of plant tissue using the Drop-seq approach. Profiling of >4,000 individual cells from the Arabidopsis root provides transcriptomes and marker genes for a diversity of cell types and illuminates the gene expression changes that occur across endodermis development.
biorxiv plant-biology 100-200-users 2018