Best Practices for Benchmarking Germline Small Variant Calls in Human Genomes, bioRxiv, 2018-02-24

AbstractAssessing accuracy of NGS variant calling is immensely facilitated by a robust benchmarking strategy and tools to carry it out in a standard way. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The Global Alliance for Genomics and Health (GA4GH) Benchmarking Team has developed standardized performance metrics and tools for benchmarking germline small variant calls. This team includes representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers for whom benchmarking variant calls is essential to their work. Benchmarking variant calls is a challenging problem for many reasons<jatslist list-type=bullet><jatslist-item>Evaluating variant calls requires complex matching algorithms and standardized counting because the same variant may be represented differently in truth and query callsets.<jatslist-item><jatslist-item>Defining and interpreting resulting metrics such as precision (aka positive predictive value = TP(TP+FP)) and recall (aka sensitivity = TP(TP+FN)) requires standardization to draw robust conclusions about comparative performance for different variant calling methods.<jatslist-item><jatslist-item>Performance of NGS methods can vary depending on variant types and genome context; and as a result understanding performance requires meaningful stratification.<jatslist-item><jatslist-item>High-confidence variant calls and regions that can be used as “truth” to accurately identify false positives and negatives are difficult to define, and reliable calls for the most challenging regions and variants remain out of reach.<jatslist-item>We have made significant progress on standardizing comparison methods, metric definitions and reporting, as well as developing and using truth sets. Our methods are publicly available on GitHub (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comga4ghbenchmarking-tools>httpsgithub.comga4ghbenchmarking-tools<jatsext-link>) and in a web-based app on precisionFDA, which allow users to compare their variant calls against truth sets and to obtain a standardized report on their variant calling performance. Our methods have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and critically evaluating the results.

biorxiv genomics 100-200-users 2018

Genetic meta-analysis identifies 9 novel loci and functional pathways for Alzheimer’s disease risk, bioRxiv, 2018-02-21

AbstractLate onset Alzheimer’s disease (AD) is the most common form of dementia with more than 35 million people affected worldwide, and no curative treatment available. AD is highly heritable and recent genome-wide meta-analyses have identified over 20 genomic loci associated with AD, yet only explaining a small proportion of the genetic variance indicating that undiscovered loci exist. Here, we performed the largest genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 AD cases, 383,378 controls). AD-by-proxy status is based on parental AD diagnosis, and showed strong genetic correlation with AD (rg=0.81). Genetic meta analysis identified 29 risk loci, of which 9 are novel, and implicating 215 potential causative genes. Independent replication further supports these novel loci in AD. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver and microglia). Furthermore, gene-set analyses indicate the genetic contribution of biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomisation results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying more of the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD to guide new drug development.

biorxiv genetics 100-200-users 2018

Population Replacement in Early Neolithic Britain, bioRxiv, 2018-02-19

The roles of migration, admixture and acculturation in the European transition to farming have been debated for over 100 years. Genome-wide ancient DNA studies indicate predominantly Anatolian ancestry for continental Neolithic farmers, but also variable admixture with local Mesolithic hunter-gatherers1–9. Neolithic cultures first appear in Britain c. 6000 years ago (kBP), a millennium after they appear in adjacent areas of northwestern continental Europe. However, the pattern and process of the British Neolithic transition remains unclear10–15. We assembled genome-wide data from six Mesolithic and 67 Neolithic individuals found in Britain, dating from 10.5-4.5 kBP, a dataset that includes 22 newly reported individuals and the first genomic data from British Mesolithic hunter-gatherers. Our analyses reveals persistent genetic affinities between Mesolithic British and Western European hunter-gatherers over a period spanning Britain’s separation from continental Europe. We find overwhelming support for agriculture being introduced by incoming continental farmers, with small and geographically structured levels of additional hunter-gatherer introgression. We find genetic affinity between British and Iberian Neolithic populations indicating that British Neolithic people derived much of their ancestry from Anatolian farmers who originally followed the Mediterranean route of dispersal and likely entered Britain from northwestern mainland Europe.

biorxiv evolutionary-biology 200-500-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo