Comprehensive integration of single cell data, bioRxiv, 2018-11-02
Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability Installation instructions, documentation, and tutorials are available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.satijalab.orgseurat>httpswww.satijalab.orgseurat<jatsext-link>
biorxiv genomics 200-500-users 2018A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content, bioRxiv, 2018-10-31
AbstractCannabis has been cultivated for millennia with distinct cultivars providing either fiber and grain or tetrahydrocannabinol. Recent demand for cannabidiol rather than tetrahydrocannabinol has favored the breeding of admixed cultivars with extremely high cannabidiol content. Despite several draft Cannabis genomes, the genomic structure of cannabinoid synthase loci has remained elusive. A genetic map derived from a tetrahydrocannabinolcannabidiol segregating population and a complete chromosome assembly from a high-cannabidiol cultivar together resolve the linkage of cannabidiolic and tetrahydrocannabinolic acid synthase gene clusters which are associated with transposable elements. High-cannabidiol cultivars appear to have been generated by integrating hemp-type cannabidiolic acid synthase gene clusters into a background of marijuana-type cannabis. Quantitative trait locus mapping suggests that overall drug potency, however, is associated with other genomic regions needing additional study.Resources available online at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpcannabisgenome.org>httpcannabisgenome.org<jatsext-link>SummaryA complete chromosome assembly and an ultra-high-density linkage map together identify the genetic mechanism responsible for the ratio of tetrahydrocannabinol (THC) to cannabidiol (CBD) in Cannabis cultivars, allowing paradigms for the evolution and inheritance of drug potency to be evaluated.
biorxiv genomics 100-200-users 2018Proximity RNA labeling by APEX-Seq Reveals the Organization of Translation Initiation Complexes and Repressive RNA Granules, bioRxiv, 2018-10-26
AbstractDiverse ribonucleoprotein complexes control messenger RNA processing, translation, and decay. Transcripts in these complexes localize to specific regions of the cell and can condense into non-membrane-bound structures such as stress granules. It has proven challenging to map the RNA composition of these large and dynamic structures, however. We therefore developed an RNA proximity labeling technique, APEX-Seq, which uses the ascorbate peroxidase APEX2 to probe the spatial organization of the transcriptome. We show that APEX-Seq can resolve the localization of RNAs within the cell and determine their enrichment or depletion near key RNA-binding proteins. Matching the spatial transcriptome, as revealed by APEX-Seq, with the spatial proteome determined by APEX-mass spectrometry (APEX-MS) provides new insights into the organization of translation initiation complexes on active mRNAs, as well as exposing unanticipated complexity in stress granule composition, and provides a powerful and general approach to explore the spatial environment of macromolecules.
biorxiv genomics 100-200-users 2018Polygenic scores for major depressive disorder and depressive symptoms predict response to lithium in patients with bipolar disorder, bioRxiv, 2018-10-22
AbstractBackgroundLithium is a first-line medication for bipolar disorder (BD), but only ~30% of patients respond optimally to the drug. Since genetic factors are known to mediate lithium treatment response, we hypothesized whether polygenic susceptibility to the spectrum of depression traits is associated with treatment outcomes in patients with BD. In addition, we explored the potential molecular underpinnings of this relationship.MethodsWeighted polygenic scores (PGSs) were computed for major depressive disorder (MDD) and depressive symptoms (DS) in BD patients from the Consortium on Lithium Genetics (ConLi+Gen; n=2,586) who received lithium treatment. Lithium treatment outcome was assessed using the ALDA scale. Summary statistics from genome-wide association studies (GWAS) in MDD (130,664 cases and 330,470 controls) and DS (n=161,460) were used for PGS weighting. Associations between PGSs of depression traits and lithium treatment response were assessed by binary logistic regression. We also performed a cross-trait meta-GWAS, followed by Ingenuity® Pathway Analysis.OutcomesBD patients with a low polygenic load for depressive traits were more likely to respond well to lithium, compared to patients with high polygenic load (MDD OR =1.64 [95%CI 1.26-2.15], lowest vs highest PGS quartiles; DS OR=1.53 [95%CI 1.18-2.00]). Associations were significant for type 1, but not type 2 BD. Cross-trait GWAS and functional characterization implicated voltage-gated potassium channels, insulin-related pathways, mitogen-activated protein-kinase (MAPK) signaling, and miRNA expression.InterpretationGenetic loading to depression traits in BD patients lower their odds of responding optimally to lithium. Our findings support the emerging concept of a lithium-responsive biotype in BD.FundingSee attached details
biorxiv genomics 100-200-users 2018The population history of northeastern Siberia since the Pleistocene, bioRxiv, 2018-10-22
ABSTRACTFar northeastern Siberia has been occupied by humans for more than 40 thousand years. Yet, owing to a scarcity of early archaeological sites and human remains, its population history and relationship to ancient and modern populations across Eurasia and the Americas are poorly understood. Here, we analyze 34 ancient genome sequences, including two from fragmented milk teeth found at the ~31.6 thousand-year-old (kya) Yana RHS site, the earliest and northernmost Pleistocene human remains found. These genomes reveal complex patterns of past population admixture and replacement events throughout northeastern Siberia, with evidence for at least three large-scale human migrations into the region. The first inhabitants, a previously unknown population of “Ancient North Siberians” (ANS), represented by Yana RHS, diverged ~38 kya from Western Eurasians, soon after the latter split from East Asians. Between 20 and 11 kya, the ANS population was largely replaced by peoples with ancestry related to present-day East Asians, giving rise to ancestral Native Americans and “Ancient Paleosiberians” (AP), represented by a 9.8 kya skeleton from Kolyma River. AP are closely related to the Siberian ancestors of Native Americans, and ancestral to contemporary communities such as Koryaks and Itelmen. Paleoclimatic modelling shows evidence for a refuge during the last glacial maximum (LGM) in southeastern Beringia, suggesting Beringia as a possible location for the admixture forming both ancestral Native Americans and AP. Between 11 and 4 kya, AP were in turn largely replaced by another group of peoples with ancestry from East Asia, the “Neosiberians” from which many contemporary Siberians derive. We detect gene flow events in both directions across the Bering Strait during this time, influencing the genetic composition of Inuit, as well as Na Dene-speaking Northern Native Americans, whose Siberian-related ancestry components is closely related to AP. Our analyses reveal that the population history of northeastern Siberia was highly dynamic throughout the Late Pleistocene and Holocene. The pattern observed in northeastern Siberia, with earlier, once widespread populations being replaced by distinct peoples, seems to have taken place across northern Eurasia, as far west as Scandinavia.
biorxiv genomics 100-200-users 2018Transposon accumulation lines uncover histone H2A.Z-driven integration bias towards environmentally responsive genes, bioRxiv, 2018-10-19
Inherited transposition events are important drivers of genome evolution but because transposable element (TE) mobilization is usually rare, its impact on the creation of genetic variation remains poorly characterized. Here, we used a population of A. thaliana epigenetic recombinant inbred lines (epiRILs) to characterize >8000 de novo insertions produced by several TEs families also active in nature. Integration was strongly biased towards genes, with evident deleterious effects. Biases were TE family-specific and associated with distinct chromatin features. Notably, we demonstrate that the histone variant H2A.Z guides the preferential integration of Ty1Copia LTR-retrotransposons within environmentally responsive genes and that this guiding function is evolutionary conserved. Finally, we uncover an important role for epigenetic silencing in exacerbating or alleviating the effects of TE insertions on target genes. These findings establish chromatin as a major determinant of the spectrum and functional impact of TE-generated mutations, with important implications for adaptation and evolution.
biorxiv genomics 0-100-users 2018