The whole-genome panorama of cancer drivers, bioRxiv, 2017-09-21
SUMMARYThe advance of personalized cancer medicine requires the accurate identification of the mutations driving each patient’s tumor. However, to date, we have only been able to obtain partial insights into the contribution of genomic events to tumor development. Here, we design a comprehensive approach to identify the driver mutations in each patient’s tumor and obtain a whole-genome panorama of driver events across more than 2,500 tumors from 37 types of cancer. This panorama includes coding and non-coding point mutations, copy number alterations and other genomic rearrangements of somatic origin, and potentially predisposing germline variants. We demonstrate that genomic events are at the root of virtually all tumors, with each carrying on average 4.6 driver events. Most individual tumors harbor a unique combination of drivers, and we uncover the most frequent co-occurring driver events. Half of all cancer genes are affected by several types of driver mutations. In summary, the panorama described here provides answers to fundamental questions in cancer genomics and bridges the gap between cancer genomics and personalized cancer medicine.
biorxiv cancer-biology 100-200-users 2017Real-time DNA barcoding in a remote rainforest using nanopore sequencing, bioRxiv, 2017-09-16
AbstractAdvancements in portable scientific instruments provide promising avenues to expedite field work in order to understand the diverse array of organisms that inhabit our planet. Here we tested the feasibility for in situ molecular analyses of endemic fauna using a portable laboratory fitting within a single backpack, in one of the world’s most imperiled biodiversity hotspots the Ecuadorian Chocó rainforest. We utilized portable equipment, including the MinION DNA sequencer (Oxford Nanopore Technologies) and miniPCR (miniPCR), to perform DNA extraction, PCR amplification and real-time DNA barcode sequencing of reptile specimens in the field. We demonstrate that nanopore sequencing can be implemented in a remote tropical forest to quickly and accurately identify species using DNA barcoding, as we generated consensus sequences for species resolution with an accuracy of >99% in less than 24 hours after collecting specimens. In addition, we generated sequence information at Universidad Tecnológica Indoamérica in Quito for the recently re-discovered Jambato toad Atelopus ignescens, which was thought to be extinct for 28 years, a rare species of blind snake Trilepida guayaquilensis, and two undescribed species of Dipsas snakes. In this study we establish how mobile laboratories and nanopore sequencing can help to accelerate species identification in remote areas (especially for species that are difficult to diagnose based on characters of external morphology), be applied to local research facilities in developing countries, and rapidly generate information for species that are rare, endangered and undescribed, which can potentially aid in conservation efforts.
biorxiv evolutionary-biology 100-200-users 2017Minor allele frequency thresholds dramatically affect population structure inference with genomic datasets, bioRxiv, 2017-09-15
AbstractOne common method of minimizing errors in large DNA sequence datasets is to drop variable sites with a minor allele frequency below some specified threshold. Though widespread, this procedure has the potential to alter downstream population genetic inferences and has received relatively little rigorous analysis. Here we use simulations and an empirical SNP dataset to demonstrate the impacts of minor allele frequency (MAF) thresholds on inference of population structure. We find that model-based inference of population structure is confounded when singletons are included in the alignment, and that both model-based and multivariate analyses infer less distinct clusters when more stringent MAF cutoffs are applied. We propose that this behavior is caused by the combination of a drop in the total size of the data matrix and by correlations between allele frequencies and mutational age. We recommend a set of best practices for applying MAF filters in studies seeking to describe population structure with genomic data.
biorxiv genomics 100-200-users 2017A zombie LIF gene in elephants is up-regulated by TP53 to induce apoptosis in response to DNA damage, bioRxiv, 2017-09-13
AbstractLarge bodied organisms have more cells that can potentially turn cancerous than smallbodied organisms with fewer cells, imposing an increased risk of developing cancer. This expectation predicts a positive correlation between body size and cancer risk, however, there is no correlation between body size and cancer risk across species (‘Peto’s Paradox’). Here we show that elephants and their extinct relatives (Proboscideans) may have resolved Peto’s Paradox in part through re-functionalizing a leukemia inhibitory factor pseudogene (LIF6) with pro-apoptotic functions. The LIF6 gene is transcriptionally up-regulated by TP53 in response to DNA damage, and translocates to the mitochondria where it induces apoptosis. Phylogenetic analyses of living and extinct Proboscidean LIF6 genes indicates its TP53 response element evolved coincident with the evolution of large body sizes in the Proboscidean stem-lineage. These results suggest that re-functionalizing of a pro-apoptotic LIF pseudogene may have been permissive (though not sufficient) for the evolution of large body sizes in Proboscideans.
biorxiv evolutionary-biology 100-200-users 2017No major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”, bioRxiv, 2017-09-12
AbstractIn a recently published PNAS article, we studied the identifiability of genomic samples using machine learning methods [Lippert et al., 2017]. In a response, Erlich [2017] argued that our work contained major flaws. The main technical critique of Erlich [2017] builds on a simulation experiment that shows that our proposed algorithm, which uses only a genomic sample for identification, performed no better than a strategy that uses demographic variables. Below, we show why this comparison is misleading and provide a detailed discussion of the key critical points in our analyses that have been brought up in Erlich [2017] and in the media. Further, not only faces may be derived from DNA, but a wide range of phenotypes and demographic variables. In this light, the main contribution of Lippert et al. [2017] is an algorithm that identifies genomes of individuals by combining multiple DNA-based predictive models for a myriad of traits.
biorxiv genomics 100-200-users 2017Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types, bioRxiv, 2017-09-04
AbstractWe present the most comprehensive catalogue of cancer-associated gene alterations through characterization of tumor transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes project. Using matched whole-genome sequencing data, we attributed RNA alterations to germline and somatic DNA alterations, revealing likely genetic mechanisms. We identified 444 associations of gene expression with somatic non-coding single-nucleotide variants. We found 1,872 splicing alterations associated with somatic mutation in intronic regions, including novel exonization events associated with Alu elements. Somatic copy number alterations were the major driver of total gene and allele-specific expression (ASE) variation. Additionally, 82% of gene fusions had structural variant support, including 75 of a novel class called “bridged” fusions, in which a third genomic location bridged two different genes. Globally, we observe transcriptomic alteration signatures that differ between cancer types and have associations with DNA mutational signatures. Given this unique dataset of RNA alterations, we also identified 1,012 genes significantly altered through both DNA and RNA mechanisms. Our study represents an extensive catalog of RNA alterations and reveals new insights into the heterogeneous molecular mechanisms of cancer gene alterations.
biorxiv genomics 100-200-users 2017