A single cell framework for multi-omic analysis of disease identifies malignant regulatory signatures in mixed phenotype acute leukemia, bioRxiv, 2019-07-10
AbstractIn order to identify the molecular determinants of human diseases, such as cancer, that arise from a diverse range of tissue, it is necessary to accurately distinguish normal and pathogenic cellular programs.1–3Here we present a novel approach for single-cell multi-omic deconvolution of healthy and pathological molecular signatures within phenotypically heterogeneous malignant cells. By first creating immunophenotypic, transcriptomic and epigenetic single-cell maps of hematopoietic development from healthy peripheral blood and bone marrow mononuclear cells, we identify cancer-specific transcriptional and chromatin signatures from single cells in a cohort of mixed phenotype acute leukemia (MPAL) clinical samples. MPALs are a high-risk subtype of acute leukemia characterized by a heterogeneous malignant cell population expressing both myeloid and lymphoid lineage-specific markers.4, 5Our results reveal widespread heterogeneity in the pathogenetic gene regulatory and expression programs across patients, yet relatively consistent changes within patients even across malignant cells occupying diverse portions of the hematopoietic lineage. An integrative analysis of transcriptomic and epigenetic maps identifies 91,601 putative gene-regulatory interactions and classifies a number of transcription factors that regulate leukemia specific genes, includingRUNX1-linked regulatory elements proximal toCD69. This work provides a template for integrative, multi-omic analysis for the interpretation of pathogenic molecular signatures in the context of developmental origin.
biorxiv genomics 100-200-users 2019Assessing the analytical validity of SNP-chips for detecting very rare pathogenic variants implications for direct-to-consumer genetic testing, bioRxiv, 2019-07-10
ABSTRACTObjectivesTo determine the analytical validity of SNP-chips for genotyping very rare genetic variants.DesignRetrospective study using data from two publicly available resources, the UK Biobank and the Personal Genome Project.SettingResearch biobanks and direct-to-consumer genetic testing in the UK and USA.Participants49,908 individuals recruited to UK Biobank, and 21 individuals who purchased consumer genetic tests and shared their data online via the Personal Genomes Project.Main outcome measuresWe assessed the analytical validity of genotypes from SNP-chips (index test) with sequencing data (reference standard). We evaluated the genotyping accuracy of the SNP-chips and split the results by variant frequency. We went on to select rare pathogenic variants in the BRCA1 and BRCA2 genes as an exemplar for detailed analysis of clinically-actionable variants in UK Biobank, and assessed BRCA-related cancers (breast, ovarian, prostate and pancreatic) in participants using cancer registry data.ResultsSNP-chip genotype accuracy is high overall; sensitivity, specificity and precision are all >99% for 108,574 common variants directly genotyped by the UK Biobank SNP-chips. However, the likelihood of a true positive result reduces dramatically with decreasing variant frequency; for variants with a frequency <0.001% in UK Biobank the precision is very low and only 16% of 4,711 variants from the SNP-chips confirm with sequencing data. Results are similar for SNP-chip data from the Personal Genomes Project, and 2021 individuals have at least one rare pathogenic variant that has been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, the overall performance metrics of the SNP-chips in UK Biobank are sensitivity 34.6%, specificity 98.3% and precision 4.2%. Rates of BRCA-related cancers in individuals in UK Biobank with a positive SNP-chip result are similar to age-matched controls (OR 1.28, P=0.07, 95% CI 0.98 to 1.67), while sequence-positive individuals have a significantly increased risk (OR 3.73, P=3.5×10−12, 95% CI 2.57 to 5.40).ConclusionSNP-chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.SUMMARY BOXSection 1 What is already known on this topicSNP-chips are an accurate and affordable method for genotyping common genetic variants across the genome. They are often used by direct-to-consumer (DTC) genetic testing companies and research studies, but there several case reports suggesting they perform poorly for genotyping rare genetic variants when compared with sequencing.Section 2 What this study addsOur study confirms that SNP-chips are highly inaccurate for genotyping rare, clinically-actionable variants. Using large-scale SNP-chip and sequencing data from UK Biobank, we show that SNP-chips have a very low precision of <16% for detecting very rare variants (i.e. the majority of variants with population frequency of <0.001% are false positives). We observed a similar performance in a small sample of raw SNP-chip data from DTC genetic tests. Very rare variants assayed using SNP-chips should not be used to guide health decisions without validation.
biorxiv genetics 200-500-users 2019Structures of virus-like capsids formed by the Drosophila neuronal Arc proteins, bioRxiv, 2019-07-10
AbstractThe neuronal protein Arc is a critical mediator of synaptic plasticity. Arc originated in tetrapods and flies through domestication of retrotransposon Gag genes. Recent studies have suggested that Arc mediates intercellular mRNA transfer and like Gag, can form capsid-like structures. Here we report that Drosophila proteins dArc1 and dArc2 assemble virus-like capsids. We determine the capsid structures to 2.8 Å and 3.7 Å resolution, respectively, finding similarity to capsids of retroviruses and retrotransposons. Differences between dArc1 and dArc2 capsids, including the presence of a structured zinc-finger pair in dArc1, are consistent with differential RNA-binding specificity. Our data support a model in which ancestral capsid-forming and RNA-binding properties of Arc remain under positive selection pressure and have been repurposed to function in neuronal signalling.
biorxiv neuroscience 100-200-users 2019CD44 regulates epigenetic plasticity by mediating iron endocytosis, bioRxiv, 2019-07-09
SUMMARYCD44 is a transmembrane glycoprotein that is linked to various biological processes reliant on the epigenetic plasticity of cells, including development, inflammation, immune responses, wound healing and cancer progression. While thoroughly studied, functional regulatory roles of this so-called ‘cell surface marker’ remain elusive. Here, we report the discovery that CD44 mediates endocytosis of iron interacting with hyaluronates in tumorigenic cell lines and primary cancer cells. We found that this glycan-mediated iron endocytosis mechanism is enhanced during epithelial-mesenchymal transition, unlike the canonical transferrin-dependent pathway. This transition is further characterized by molecular changes required for iron-catalyzed oxidative demethylation of the repressive histone mark H3K9me2 that governs the expression of mesenchymal genes. CD44 itself is transcriptionally regulated by nuclear iron, demonstrating a positive feedback loop, which is in contrast to the negative regulation of transferrin receptor by excess iron. Finally, we show that epigenetic plasticity can be altered by interfering with iron homeostasis using small molecules. This comprehensive study reveals an alternative iron uptake mechanism that prevails in the mesenchymal state of mammalian cells, illuminating a central role of iron as a rate-limiting regulator of epigenetic plasticity.
biorxiv cell-biology 0-100-users 2019Mapping Vector Field of Single Cells, bioRxiv, 2019-07-09
AbstractUnderstanding how gene expression in single cells progress over time is vital for revealing the mechanisms governing cell fate transitions. RNA velocity, which infers immediate changes in gene expression by comparing levels of new (unspliced) versus mature (spliced) transcripts (La Manno et al. 2018), represents an important advance to these efforts. A key question remaining is whether it is possible to predict the most probable cell state backward or forward over arbitrary time-scales. To this end, we introduce an inclusive model (termed Dynamo) capable of predicting cell states over extended time periods, that incorporates promoter state switching, transcription, splicing, translation and RNAprotein degradation by taking advantage of scRNA-seq and the co-assay of transcriptome and proteome. We also implement scSLAM-seq by extending SLAM-seq to plate-based scRNA-seq (Hendriks et al. 2018; Erhard et al. 2019; Cao, Zhou, et al. 2019) and augment the model by explicitly incorporating the metabolic labelling of nascent RNA. We show that through careful design of labelling experiments and an efficient mathematical framework, the entire kinetic behavior of a cell from this model can be robustly and accurately inferred. Aided by the improved framework, we show that it is possible to reconstruct the transcriptomic vector field from sparse and noisy vector samples generated by single cell experiments. The reconstructed vector field further enables global mapping of potential landscapes that reflects the relative stability of a given cell state, and the minimal transition time and most probable paths between any cell states in the state space. This work thus foreshadows the possibility of predicting long-term trajectories of cells during a dynamic process instead of short time velocity estimates. Our methods are implemented as an open source tool, dynamo (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comaristoteleodynamo-release>httpsgithub.comaristoteleodynamo-release<jatsext-link>).
biorxiv systems-biology 100-200-users 2019Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection, bioRxiv, 2019-07-09
AbstractTigers are among the most charismatic of endangered species, yet little is known about their evolutionary history. We sequenced 65 individual genomes representing extant tiger geographic range. We found strong genetic differentiation between putative tiger subspecies, divergence within the last 10,000 years, and demographic histories dominated by population bottlenecks. Indian tigers have substantial genetic variation and substructure stemming from population isolation and intense recent bottlenecks here. Despite high genetic diversity across India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding here. Amur tiger genomes revealed the strongest signals of selection and over-representation of gene ontology categories potentially involved in metabolic adaptation to cold. Novel insights highlight the antiquity of northeast Indian tigers. Our results demonstrate recent evolution, with differential isolation, selection and drift in extant tiger populations, providing insights for conservation and future survival.
biorxiv genomics 0-100-users 2019