Re-evaluation of SNP heritability in complex human traits, bioRxiv, 2016-09-10
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (SD 3) higher than those obtained from the widely-used software GCTA, and 25% (SD 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model their estimated contribution is only 24%.
biorxiv genetics 100-200-users 2016Improving genetic diagnosis in Mendelian disease with transcriptome sequencing, bioRxiv, 2016-09-09
AbstractExome and whole-genome sequencing are becoming increasingly routine approaches in Mendelian disease diagnosis. Despite their success, the current diagnostic rate for genomic analyses across a variety of rare diseases is approximately 25-50%. Here, we explore the utility of transcriptome sequencing (RNA-seq) as a complementary diagnostic tool in a cohort of 50 patients with genetically undiagnosed rare muscle disorders. We describe an integrated approach to analyze patient muscle RNA-seq, leveraging an analysis framework focused on the detection of transcript-level changes that are unique to the patient compared to over 180 control skeletal muscle samples. We demonstrate the power of RNA-seq to validate candidate splice-disrupting mutations and to identify splice-altering variants in both exonic and deep intronic regions, yielding an overall diagnosis rate of 35%. We also report the discovery of a highly recurrent de novo intronic mutation in COL6A1 that results in a dominantly acting splice-gain event, disrupting the critical glycine repeat motif of the triple helical domain. We identify this pathogenic variant in a total of 27 genetically unsolved patients in an external collagen VI-like dystrophy cohort, thus explaining approximately 25% of patients clinically suggestive of collagen VI dystrophy in whom prior genetic analysis is negative. Overall, this study represents a large systematic application of transcriptome sequencing to rare disease diagnosis and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches.One Sentence SummaryTranscriptome sequencing improves the diagnostic rate for Mendelian disease in patients for whom genetic analysis has not returned a diagnosis.
biorxiv genomics 100-200-users 2016Power Analysis of Single Cell RNA-Sequencing Experiments, bioRxiv, 2016-09-09
AbstractHigh-throughput single cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, and has revealed new cell types, and new insights into developmental process and stochasticity in gene expression. There are now several published scRNA-seq protocols, which all sequence transcriptomes from a minute amount of starting material. Therefore, a key question is how these methods compare in terms of sensitivity of detection of mRNA molecules, and accuracy of quantification of gene expression. Here, we assessed the sensitivity and accuracy of many published data sets based on standardized spike-ins with a uniform raw data processing pipeline. We developed a flexible and fast UMI counting tool (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comvalsumis>httpsgithub.comvalsumis<jatsext-link>) which is compatible with all UMI based protocols. This allowed us to relate these parameters to sequencing depth, and discuss the trade offs between the different methods. To confirm our results, we performed experiments on cells from the same population using three different protocols. We also investigated the effect of RNA degradation on spike-in molecules, and the average efficiency of scRNA-seq on spike-in molecules versus endogenous RNAs.
biorxiv genomics 100-200-users 2016In-field metagenome and 16S rRNA gene amplicon nanopore sequencing robustly characterize glacier microbiota, bioRxiv, 2016-09-08
ABSTRACTIn the field of observation, chance favours only the prepared mind (Pasteur). Impressive developments in genomics have led microbiology to its third “Golden Age”. However, conventional metagenomics strategies necessitate retrograde transfer of samples from extreme or remote environments for later analysis, rendering the powerful insights gained retrospective in nature, striking a contrast with Pasteur’s dictum. Here we implement highly portable USB-based nanopore DNA sequencing platforms coupled with field-adapted environmental DNA extraction, rapid sequence library generation and off-line analyses of shotgun metagenome and 16S ribosomal RNA gene amplicon profiles to characterize microbiota dwelling within cryoconite holes upon Svalbard glaciers, the Greenland Ice Sheet and the Austrian Alps. We show in-field nanopore sequencing of metagenomes captures taxonomic composition of supraglacial microbiota, while 16S rRNA Furthermore, comparison of nanopore data with prior 16S rRNA gene V1-V3 pyrosequencing from the same samples, demonstrates strong correlations between profiles obtained from nanopore sequencing and laboratory based sequencing approaches. gene amplicon sequencing resolves bacterial community responses to habitat changes. Finally, we demonstrate the fidelity and sensitivity of in-field sequencing by analysis of mock communities using field protocols. Ultimately, in-field sequencing potentiated by nanopore devices raises the prospect of enhanced agility in exploring Earth’s most remote microbiomes.
biorxiv microbiology 100-200-users 2016Using high-resolution variant frequencies to empower clinical genome interpretation, bioRxiv, 2016-09-03
ABSTRACTWhole exome and genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognised as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants. Here we present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets. Using the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, and identifies 43 variants previously reported as pathogenic that can now be reclassified. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.
biorxiv genomics 100-200-users 2016Genes mirror migrations and cultures in prehistoric Europe – a population genomic perspective, bioRxiv, 2016-09-02
AbstractGenomic information from ancient human remains is beginning to show its full potential for learning about human prehistory. We review the last few years' dramatic finds about European prehistory based on genomic data from humans that lived many millennia ago and relate it to modern-day patterns of genomic variation. The early times, the Upper Palaeolithic, appears to contain several population turn-overs followed by more stable populations after the Last Glacial Maximum and during the Mesolithic. Some 11,000 years ago the migrations driving the Neolithic transition start from around Anatolia and reach the north and the west of Europe millennia later followed by major migrations during the Bronze age. These findings show that culture and lifestyle were major determinants of genomic differentiation and similarity in pre-historic Europe rather than geography as is the case today.
biorxiv evolutionary-biology 0-100-users 2016