Accurate functional classification of thousands of BRCA1 variants with saturation genome editing, bioRxiv, 2018-04-05
AbstractVariants of uncertain significance (VUS) fundamentally limit the utility of genetic information in a clinical setting. The challenge of VUS is epitomized by BRCA1, a tumor suppressor gene integral to DNA repair and genomic stability. Germline BRCA1 loss-of-function (LOF) variants predispose women to early-onset breast and ovarian cancers. Although BRCA1 has been sequenced in millions of women, the risk associated with most newly observed variants cannot be definitively assigned. Data sharing attenuates this problem but it is unlikely to solve it, as most newly observed variants are exceedingly rare. In lieu of genetic evidence, experimental approaches can be used to functionally characterize VUS. However, to date, functional studies of BRCA1 VUS have been conducted in a post hoc, piecemeal fashion. Here we employ saturation genome editing to assay 96.5% of all possible single nucleotide variants (SNVs) in 13 exons that encode functionally critical domains of BRCA1. Our assay measures cellular fitness in a haploid human cell line whose survival is dependent on intact BRCA1 function. The resulting function scores for nearly 4,000 SNVs are bimodally distributed and almost perfectly concordant with established assessments of pathogenicity. Sequence-function maps enhanced by parallel measurements of variant effects on mRNA levels reveal mechanisms by which loss-of-function SNVs arise. Hundreds of missense SNVs critical for protein function are identified, as well as dozens of exonic and intronic SNVs that compromise BRCA1 function by disrupting splicing or transcript stability. We predict that these function scores will be directly useful for the clinical interpretation of cancer risk based on BRCA1 sequencing. Furthermore, we propose that this paradigm can be extended to overcome the challenge of VUS in other genes in which genetic variation is clinically actionable.
biorxiv genomics 200-500-users 2018Molecular architecture of the mouse nervous system, bioRxiv, 2018-04-05
AbstractThe mammalian nervous system executes complex behaviors controlled by specialised, precisely positioned and interacting cell types. Here, we used RNA sequencing of half a million single cells to create a detailed census of cell types in the mouse nervous system. We mapped cell types spatially and derived a hierarchical, data-driven taxonomy. Neurons were the most diverse, and were grouped by developmental anatomical units, and by the expression of neurotransmitters and neuropeptides. Neuronal diversity was driven by genes encoding cell identity, synaptic connectivity, neurotransmission and membrane conductance. We discovered several distinct, regionally restricted, astrocytes types, which obeyed developmental boundaries and correlated with the spatial distribution of key glutamate and glycine neurotransmitters. In contrast, oligodendrocytes showed a loss of regional identity, followed by a secondary diversification. The resource presented here lays a solid foundation for understanding the molecular architecture of the mammalian nervous system, and enables genetic manipulation of specific cell types.
biorxiv neuroscience 200-500-users 2018High-throughput mapping of long-range neuronal projection using in situ sequencing, bioRxiv, 2018-04-04
SummaryUnderstanding neural circuits requires deciphering interactions among myriad cell types defined by spatial organization, connectivity, gene expression, and other properties. Resolving these cell types requires both single neuron resolution and high throughput, a challenging combination with conventional methods. Here we introduce BARseq, a multiplexed method based on RNA barcoding for mapping projections of thousands of spatially resolved neurons in a single brain, and relating those projections to other properties such as gene or Cre expression. Mapping the projections to 11 areas of 3579 neurons in mouse auditory cortex using BARseq confirmed the laminar organization of the three top classes (IT, PT-like and CT) of projection neurons. In depth analysis uncovered a novel projection type restricted almost exclusively to transcriptionally-defined subtypes of IT neurons. By bridging anatomical and transcriptomic approaches at cellular resolution with high throughput, BARseq can potentially uncover the organizing principles underlying the structure and formation of neural circuits.
biorxiv neuroscience 100-200-users 2018The organization of intracortical connections by layer and cell class in the mouse brain, bioRxiv, 2018-04-01
AbstractThe mammalian cortex is a laminar structure composed of many cell types densely interconnected in complex ways. Recent systematic efforts to map the mouse mesoscale connectome provide comprehensive projection data on interareal connections, but not at the level of specific cell classes or layers within cortical areas. We present here a significant expansion of the Allen Mouse Brain Connectivity Atlas, with ∼1,000 new axonal projection mapping experiments across nearly all isocortical areas in 49 Cre driver lines. Using 13 lines selective for cortical layer-specific projection neuron classes, we identify the differential contribution of each layerclass to the overall intracortical connectivity patterns. We find layer 5 (L5) projection neurons account for essentially all intracortical outputs. L23, L4, and L6 neurons contact a subset of the L5 cortical targets. We also describe the most common axon lamination patterns in cortical targets. Most patterns are consistent with previous anatomical rules used to determine hierarchical position between cortical areas (feedforward, feedback), with notable exceptions. While diverse target lamination patterns arise from every source layerclass, L23 and L4 neurons are primarily associated with feedforward type projection patterns and L6 with feedback. L5 has both feedforward and feedback projection patterns. Finally, network analyses revealed a modular organization of the intracortical connectome. By labeling interareal and intermodule connections as feedforward or feedback, we present an integrated view of the intracortical connectome as a hierarchical network.
biorxiv neuroscience 200-500-users 2018The Genomic Formation of South and Central Asia, bioRxiv, 2018-03-31
AbstractThe genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia—consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC—and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.One Sentence SummaryGenome wide ancient DNA from 357 individuals from Central and South Asia sheds new light on the spread of Indo-European languages and parallels between the genetic history of two sub-continents, Europe and South Asia.
biorxiv genomics 500+-users 2018Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing, bioRxiv, 2018-03-30
AbstractTranscriptome profiles of individual cells reflect true and often unexplored biological diversity, but are also affected by noise of biological and technical nature. This raises the need to explicitly model the resulting uncertainty and take it into account in any downstream analysis, such as dimensionality reduction, clustering, and differential expression. Here, we introduce Single-cell Variational Inference (scVI), a scalable framework for probabilistic representation and analysis of gene expression in single cells. Our model uses variational inference and stochastic optimization of deep neural networks to approximate the parameters that govern the distribution of expression values of each gene in every cell, using a non-linear mapping between the observations and a low-dimensional latent space.By doing so, scVI pools information between similar cells or genes while taking nuisance factors of variation such as batch effects and limited sensitivity into account. To evaluate scVI, we conducted a comprehensive comparative analysis to existing methods for distributional modeling and dimensionality reduction, all of which rely on generalized linear models. We first show that scVI scales to over one million cells, whereas competing algorithms can process at most tens of thousands of cells. Next, we show that scVI fits unseen data more closely and can impute missing data more accurately, both indicative of a better generalization capacity. We then utilize scVI to conduct a set of fundamental analysis tasks – including batch correction, visualization, clustering and differential expression – and demonstrate its accuracy in comparison to the state-of-the-art tools in each task. scVI is publicly available, and can be readily used as a principled and inclusive solution for multiple tasks of single-cell RNA sequencing data analysis.
biorxiv bioinformatics 0-100-users 2018