Single-cell epigenomic identification of inherited risk loci in Alzheimer’s and Parkinson’s disease, bioRxiv, 2020-01-07
ABSTRACTGenome-wide association studies (GWAS) have identified thousands of variants associated with disease phenotypes. However, the majority of these variants do not alter coding sequences, making it difficult to assign their function. To this end, we present a multi-omic epigenetic atlas of the adult human brain through profiling of the chromatin accessibility landscapes and three-dimensional chromatin interactions of seven brain regions across a cohort of 39 cognitively healthy individuals. Single-cell chromatin accessibility profiling of 70,631 cells from six of these brain regions identifies 24 distinct cell clusters and 359,022 cell type-specific regulatory elements, capturing the regulatory diversity of the adult brain. We develop a machine learning classifier to integrate this multi-omic framework and predict dozens of functional single nucleotide polymorphisms (SNPs), nominating gene and cellular targets for previously orphaned GWAS loci. These predictions both inform well-studied disease-relevant genes, such as BIN1 in microglia for Alzheimer’s disease (AD) and reveal novel gene-disease associations, such as STAB1 in microglia and MAL in oligodendrocytes for Parkinson’s disease (PD). Moreover, we dissect the complex inverted haplotype of the MAPT (encoding tau) PD risk locus, identifying ectopic enhancer-gene contacts in neurons that increase MAPT expression and may mediate this disease association. This work greatly expands our understanding of inherited variation in AD and PD and provides a roadmap for the epigenomic dissection of noncoding regulatory variation in disease.
biorxiv genomics 100-200-users 2020Binary and analog variation of synapses between cortical pyramidal neurons, bioRxiv, 2020-01-01
AbstractLearning from experience depends at least in part on changes in neuronal connections. We present the largest map of connectivity to date between cortical neurons of a defined type (L23 pyramidal cells), which was enabled by automated analysis of serial section electron microscopy images with improved handling of image defects. We used the map to identify constraints on the learning algorithms employed by the cortex. Previous cortical studies modeled a continuum of synapse sizes (Arellano et al., 2007) by a log-normal distribution (Loewenstein, Kuras and Rumpel, 2011; de Vivo et al., 2017; Santuy et al., 2018). A continuum is consistent with most neural network models of learning, in which synaptic strength is a continuously graded analog variable. Here we show that synapse size, when restricted to synapses between L23 pyramidal cells, is well-modeled by the sum of a binary variable and an analog variable drawn from a log-normal distribution. Two synapses sharing the same presynaptic and postsynaptic cells are known to be correlated in size (Sorra and Harris, 1993; Koester and Johnston, 2005; Bartol et al., 2015; Kasthuri et al., 2015; Dvorkin and Ziv, 2016; Bloss et al., 2018; Motta et al., 2019). We show that the binary variables of the two synapses are highly correlated, while the analog variables are not. Binary variation could be the outcome of a Hebbian or other synaptic plasticity rule depending on activity signals that are relatively uniform across neuronal arbors, while analog variation may be dominated by other influences. We discuss the implications for the stability-plasticity dilemma.
biorxiv neuroscience 100-200-users 2020Single cell epigenomic atlas of the developing human brain and organoids, bioRxiv, 2020-01-01
AbstractDynamic changes in chromatin accessibility coincide with important aspects of neuronal differentiation, such as fate specification and arealization and confer cell type-specific associations to neurodevelopmental disorders. However, studies of the epigenomic landscape of the developing human brain have yet to be performed at single-cell resolution. Here, we profiled chromatin accessibility of >75,000 cells from eight distinct areas of developing human forebrain using single cell ATAC-seq (scATACseq). We identified thousands of loci that undergo extensive cell type-specific changes in accessibility during corticogenesis. Chromatin state profiling also reveals novel distinctions between neural progenitor cells from different cortical areas not seen in transcriptomic profiles and suggests a role for retinoic acid signaling in cortical arealization. Comparison of the cell type-specific chromatin landscape of cerebral organoids to primary developing cortex found that organoids establish broad cell type-specific enhancer accessibility patterns similar to the developing cortex, but lack many putative regulatory elements identified in homologous primary cell types. Together, our results reveal the important contribution of chromatin state to the emerging patterns of cell type diversity and cell fate specification and provide a blueprint for evaluating the fidelity and robustness of cerebral organoids as a model for cortical development.
biorxiv developmental-biology 100-200-users 2020The history of measles from a 1912 genome to an antique origin, bioRxiv, 2019-12-30
AbstractMany infectious diseases are thought to have emerged in humans after the Neolithic revolution. While it is broadly accepted that this also applies to measles, the exact date of emergence for this disease is controversial. Here, we sequenced the genome of a 1912 measles virus and used selection-aware molecular clock modeling to determine the divergence date of measles virus and rinderpest virus. This divergence date represents the earliest possible date for the establishment of measles in human populations. Our analyses show that the measles virus potentially arose as early as the 4th century BCE, rekindling the recently challenged hypothesis of an antique origin of this disease.One Sentence SummaryMeasles virus diverged from rinderpest virus in the 4th century BCE, which is compatible with an emergence of measles during Antiquity.
biorxiv evolutionary-biology 100-200-users 2019Genomic evidence for global ocean plankton biogeography shaped by large-scale current systems, bioRxiv, 2019-12-07
AbstractBiogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated1,2. The most prominent global biogeography of marine plankton was derived by Longhurst3 based on parameters principally associated with photosynthetic plankton. Localized studies of selected plankton taxa or specific organismal sizes1,4–7 have mapped community structure and begun to assess the roles of environment and ocean current transport in shaping these patterns2,8. Here we assess global plankton biogeography and its relation to the biological, chemical and physical context of the ocean (the ‘seascape’) by analyzing 24 terabases of metagenomic sequence data and 739 million metabarcodes from the Tara Oceans expedition in light of environmental data and simulated ocean current transport. In addition to significant local heterogeneity, viral, prokaryotic and eukaryotic plankton communities all display near steady-state, large-scale, size-dependent biogeographical patterns. Correlation analyses between plankton transport time and metagenomic or environmental dissimilarity reveal the existence of basin-scale biological and environmental continua emerging within the main current systems. Across oceans, there is a measurable, continuous change within communities and environmental factors up to an average of 1.5 years of travel time. Modulation of plankton communities during transport varies with organismal size, such that the distribution of smaller plankton best matches Longhurst biogeochemical provinces, whereas larger plankton group into larger provinces. Together these findings provide an integrated framework to interpret plankton community organization in its physico-chemical context, paving the way to a better understanding of oceanic ecosystem functioning in a changing global environment.
biorxiv ecology 100-200-users 2019Data structures based on k-mers for querying large collections of sequencing datasets, bioRxiv, 2019-12-06
High-throughput sequencing datasets are usually deposited in public repositories, e.g. the European Nucleotide Archive, to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow to perform online sequence searches; yet such a feature would be highly useful to investigators. Towards this goal, in the last few years several computational approaches have been introduced to index and query large collections of datasets. Here we propose an accessible survey of these approaches, which are generally based on representing datasets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
biorxiv bioinformatics 100-200-users 2019