Insights into human genetic variation and population history from 929 diverse genomes, bioRxiv, 2019-06-28
AbstractGenome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented private genetic variation in southern and central Africa and in Oceania and the Americas, but an absence of fixed, private variants between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the last 10,000 years, a potentially major population growth episode after the peopling of the Americas, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations. We also demonstrate benefits to the study of population relationships of genome sequences over ascertained array genotypes. These genome sequences are freely available as a resource with no access or analysis restrictions.
biorxiv genomics 200-500-users 2019RADICL-seq identifies general and cell type-specific principles of genome-wide RNA-chromatin interactions, bioRxiv, 2019-06-28
AbstractMammalian genomes encode tens of thousands of noncoding RNAs. Most noncoding transcripts exhibit nuclear localization and several have been shown to play a role in the regulation of gene expression and chromatin remodelling. To investigate the function of such RNAs, methods to massively map the genomic interacting sites of multiple transcripts have been developed. However, they still present some limitations. Here, we introduce RNA And DNA Interacting Complexes Ligated and sequenced (RADICL-seq), a technology that maps genome-wide RNA-chromatin interactions in intact nuclei. RADICL-seq is a proximity ligation-based methodology that reduces the bias for nascent transcription, while increasing genomic coverage and unique mapping rate efficiency compared to existing methods. RADICL-seq identifies distinct patterns of genome occupancy for different classes of transcripts as well as cell type-specific RNA-chromatin interactions, and emphasizes the role of transcription in the establishment of chromatin structure.
biorxiv genomics 0-100-users 2019The mutational footprints of cancer therapies, bioRxiv, 2019-06-28
Some cancer therapies damage DNA and cause mutations both in cancer and healthy cells of the patient1. These therapy-induced mutations may underlie some of the long-term and late side effects of the treatment, such as mental disabilities, organ toxicities and secondary neoplasms. Currently we ignore the mutation pattern and burden caused by different cancer treatments. Here we identify mutational signatures, or footprints of six widely-used anti-cancer therapies with the study of whole-genomes from more than 3500 metastatic tumors originated in different organs. These include previously known and new mutational signatures generated by platinum-based drugs, and a novel signature of treatment with nucleoside metabolic inhibitors. Exploiting these mutational footprints, we estimate the contribution of different treatments to the mutation burden of tumors and their risk of causing coding and likely driver mutations in the genome. In summary, the mutational footprints identified here open a window to precisely appraise the mutational risk of different cancer therapies to understand their late side effects.
biorxiv genomics 100-200-users 2019Dissociation of solid tumour tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses, bioRxiv, 2019-06-27
AbstractBackgroundSingle-cell RNA sequencing (scRNAseq) is a powerful tool for studying complex biological systems, such as tumour heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumour tissues and patient-derived mouse xenografts for scRNAseq, are not well understood. Here, we used low temperature (6°C) protease and collagenase (37°C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNAseq dataset comprising 128,481 cells from patient cancer tissues, patient-derived breast cancer xenografts and cancer cell lines.ResultsWe observe substantial variation in standard quality control (QC) metrics of cell viability across conditions and tissues. From FACS sorted populations gated for cell viability, we identify a sub-population of dead cells that would pass standard data filtering practices, and quantify the extent to which their transcriptomes differ from live cells. We identify a further subpopulation of transcriptomically “dying” cells that exhibit up-regulation of MHC class I transcripts, in contrast with live and fully dead cells. From the contrast between tissue protease dissociation at 37°C or 6°C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, including FOS and JUN, induced by collagenase (37°C), which are minimized by dissociation with a cold active protease (6°C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues. We observe that the yield of cancer and non-cancer cell types varies between tissues and dissociation methods.ConclusionsThe method and conditions of tumour dissociation influence cell yield and transcriptome state and are both tissue and cell type dependent. Interpretation of stress pathway expression differences in cancer single cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with identification of such effects in dissociated scRNA-seq experiments.
biorxiv genomics 200-500-users 2019The genomic impact of European colonization of the Americas, bioRxiv, 2019-06-21
AbstractThe human genetic diversity of the Americas has been shaped by several events of gene flow that have continued since the Colonial Era and the Atlantic slave trade. Moreover, multiple waves of migration followed by local admixture occurred in the last two centuries, the impact of which has been largely unexplored.Here we compiled a genome-wide dataset of ∼12,000 individuals from twelve American countries and ∼6,000 individuals from worldwide populations and applied haplotype-based methods to investigate how historical movements from outside the New World affected i) the genetic structure, ii) the admixture profile, iii) the demographic history and iv) sex-biased gene-flow dynamics, of the Americas.We revealed a high degree of complexity underlying the genetic contribution of European and African populations in North and South America, from both geographic and temporal perspectives, identifying previously unreported sources related to Italy, the Middle East and to specific regions of Africa.
biorxiv genomics 100-200-users 2019Passenger Hotspot Mutations in Cancer, bioRxiv, 2019-06-19
AbstractHotspots, or mutations that recur at the same genomic site across multiple tumors, have been conventionally interpreted as strong universal evidence of somatic positive selection, unequivocally pinpointing genes driving tumorigenesis. Here, we demonstrate that this convention is falsely premised on an inaccurate statistical model of background mutagenesis. Many hotspots are in fact passenger events, recurring at sites that are simply inherently more mutable rather than under positive selection, which current background models do not account for. We thus detail a log-normal-Poisson (LNP) background model that accounts for variation in site-specific mutability in a manner consistent with models of mutagenesis, use this model to show that the tendency to generate passenger hotspots pervades all common mutational processes, and apply it to a ~10, 000 patient cohort from The Cancer Genome Atlas to nominate driver hotspots with far fewer false positives compared to conventional methods. As the biomedical community faces critical decisions in prioritizing putative driver mutations for deep experimental characterization to assess therapeutic potential, we offer our findings as a guide to avoid wasting valuable scientific resources on passenger hotspots.
biorxiv genomics 0-100-users 2019