Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe, bioRxiv, 2017-09-22
ABSTRACTThe extent to which prehistoric migrations of farmers influenced the genetic pool of western North Africans remains unclear. Archaeological evidence suggests the Neolithization process may have happened through the adoption of innovations by local Epipaleolithic communities, or by demic diffusion from the Eastern Mediterranean shores or Iberia. Here, we present the first analysis of individuals’ genome sequences from early and late Neolithic sites in Morocco, as well as Early Neolithic individuals from southern Iberia. We show that Early Neolithic Moroccans are distinct from any other reported ancient individuals and possess an endemic element retained in present-day Maghrebi populations, confirming a long-term genetic continuity in the region. Among ancient populations, Early Neolithic Moroccans are distantly related to Levantine Natufian hunter-gatherers (∼9,000 BCE) and Pre-Pottery Neolithic farmers (∼6,500 BCE). Although an expansion in Early Neolithic times is also plausible, the high divergence observed in Early Neolithic Moroccans suggests a long-term isolation and an early arrival in North Africa for this population. This scenario is consistent with early Neolithic traditions in North Africa deriving from Epipaleolithic communities who adopted certain innovations from neighbouring populations. Late Neolithic (∼3,000 BCE) Moroccans, in contrast, share an Iberian component, supporting theories of trans-Gibraltar gene flow. Finally, the southern Iberian Early Neolithic samples share the same genetic composition as the Cardial Mediterranean Neolithic culture that reached Iberia ∼5,500 BCE. The cultural and genetic similarities of the Iberian Neolithic cultures with that of North African Neolithic sites further reinforce the model of an Iberian migration into the Maghreb.SIGNIFICANCE STATEMENTThe acquisition of agricultural techniques during the so-called Neolithic revolution has been one of the major steps forward in human history. Using next-generation sequencing and ancient DNA techniques, we directly test if Neolithization in North Africa occurred through the transmission of ideas or by demic diffusion. We show that Early Neolithic Moroccans are composed of an endemic Maghrebi element still retained in present-day North African populations and distantly related to Epipaleolithic communities from the Levant. However, late Neolithic individuals from North Africa are admixed, with a North African and a European component. Our results support the idea that the Neolithization of North Africa might have involved both the development of Epipaleolithic communities and the migration of people from Europe.
biorxiv genetics 100-200-users 2017Updating the 97% identity threshold for 16S ribosomal RNA OTUs, bioRxiv, 2017-09-22
AbstractThe 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Sequences are often clustered into Operational Taxonomic Units (OTUs) as proxies for species. The canonical clustering threshold is 97% identity, which was proposed in 1994 when few 16S rRNA sequences were available, motivating a reassessment on current data. Using a large set of high-quality 16S rRNA sequences from finished genomes, I assessed the correspondence of OTUs to species for five representative clustering algorithms using four accuracy metrics. All algorithms had comparable accuracy when tuned to a given metric. Optimal identity thresholds that best approximated species were ∼99% for full-length sequences and ∼100% for the V4 hypervariable region.
biorxiv bioinformatics 100-200-users 2017The whole-genome panorama of cancer drivers, bioRxiv, 2017-09-21
SUMMARYThe advance of personalized cancer medicine requires the accurate identification of the mutations driving each patient’s tumor. However, to date, we have only been able to obtain partial insights into the contribution of genomic events to tumor development. Here, we design a comprehensive approach to identify the driver mutations in each patient’s tumor and obtain a whole-genome panorama of driver events across more than 2,500 tumors from 37 types of cancer. This panorama includes coding and non-coding point mutations, copy number alterations and other genomic rearrangements of somatic origin, and potentially predisposing germline variants. We demonstrate that genomic events are at the root of virtually all tumors, with each carrying on average 4.6 driver events. Most individual tumors harbor a unique combination of drivers, and we uncover the most frequent co-occurring driver events. Half of all cancer genes are affected by several types of driver mutations. In summary, the panorama described here provides answers to fundamental questions in cancer genomics and bridges the gap between cancer genomics and personalized cancer medicine.
biorxiv cancer-biology 100-200-users 2017Accurate Genomic Prediction Of Human Height, bioRxiv, 2017-09-19
AbstractWe construct genomic predictors for heritable and extremely complex human quan-titative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
biorxiv genomics 500+-users 2017Modified penetrance of coding variants by cis-regulatory variation shapes human traits, bioRxiv, 2017-09-19
SummaryCoding variants represent many of the strongest associations between genotype and phenotype, however they exhibit inter-individual differences in effect, known as variable penetrance. In this work, we study how cis-regulatory variation modifies the penetrance of coding variants in their target gene. Using functional genomic and genetic data from GTEx, we observed that in the general population, purifying selection has depleted haplotype combinations that lead to higher penetrance of pathogenic coding variants. Conversely, in cancer and autism patients, we observed an enrichment of haplotype combinations that lead to higher penetrance of pathogenic coding variants in disease implicated genes, which provides direct evidence that regulatory haplotype configuration of causal coding variants affects disease risk. Finally, we experimentally demonstrated that a regulatory variant can modify the penetrance of a coding variant by introducing a Mendelian SNP using CRISPRCas9 on distinct expression haplotypes and using the transcriptome as a phenotypic readout. Our results demonstrate that joint effects of regulatory and coding variants are an important part of the genetic architecture of human traits, and contribute to modified penetrance of disease-causing variants.
biorxiv genetics 200-500-users 2017Real-time DNA barcoding in a remote rainforest using nanopore sequencing, bioRxiv, 2017-09-16
AbstractAdvancements in portable scientific instruments provide promising avenues to expedite field work in order to understand the diverse array of organisms that inhabit our planet. Here we tested the feasibility for in situ molecular analyses of endemic fauna using a portable laboratory fitting within a single backpack, in one of the world’s most imperiled biodiversity hotspots the Ecuadorian Chocó rainforest. We utilized portable equipment, including the MinION DNA sequencer (Oxford Nanopore Technologies) and miniPCR (miniPCR), to perform DNA extraction, PCR amplification and real-time DNA barcode sequencing of reptile specimens in the field. We demonstrate that nanopore sequencing can be implemented in a remote tropical forest to quickly and accurately identify species using DNA barcoding, as we generated consensus sequences for species resolution with an accuracy of >99% in less than 24 hours after collecting specimens. In addition, we generated sequence information at Universidad Tecnológica Indoamérica in Quito for the recently re-discovered Jambato toad Atelopus ignescens, which was thought to be extinct for 28 years, a rare species of blind snake Trilepida guayaquilensis, and two undescribed species of Dipsas snakes. In this study we establish how mobile laboratories and nanopore sequencing can help to accelerate species identification in remote areas (especially for species that are difficult to diagnose based on characters of external morphology), be applied to local research facilities in developing countries, and rapidly generate information for species that are rare, endangered and undescribed, which can potentially aid in conservation efforts.
biorxiv evolutionary-biology 100-200-users 2017