Creating a universal SNP and small indel variant caller with deep neural networks, bioRxiv, 2016-12-15

AbstractNext-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual’s genome1 by calling genetic variants present in an individual using billions of short, errorful sequence reads2. Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome3,4. Here we show that a deep convolutional neural network5 can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the “highest performance” award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.

biorxiv genomics 200-500-users 2016

Population history of the Sardinian people inferred from whole-genome sequencing, bioRxiv, 2016-12-09

AbstractThe population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

biorxiv genetics 0-100-users 2016

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo