Local genetic effects on gene expression across 44 human tissues, bioRxiv, 2016-09-10
AbstractExpression quantitative trait locus (eQTL) mapping provides a powerful means to identify functional variants influencing gene expression and disease pathogenesis. We report the identification of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated protein-coding genes, with one-third having multiple independent effects. We identify numerous tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in diverse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tissues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project provides an exceptional resource that has improved our understanding of gene regulation across tissues and the role of regulatory variation in human genetic diseases.
biorxiv genomics 0-100-users 2016Re-evaluation of SNP heritability in complex human traits, bioRxiv, 2016-09-10
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (SD 3) higher than those obtained from the widely-used software GCTA, and 25% (SD 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model their estimated contribution is only 24%.
biorxiv genetics 100-200-users 2016Improving genetic diagnosis in Mendelian disease with transcriptome sequencing, bioRxiv, 2016-09-09
AbstractExome and whole-genome sequencing are becoming increasingly routine approaches in Mendelian disease diagnosis. Despite their success, the current diagnostic rate for genomic analyses across a variety of rare diseases is approximately 25-50%. Here, we explore the utility of transcriptome sequencing (RNA-seq) as a complementary diagnostic tool in a cohort of 50 patients with genetically undiagnosed rare muscle disorders. We describe an integrated approach to analyze patient muscle RNA-seq, leveraging an analysis framework focused on the detection of transcript-level changes that are unique to the patient compared to over 180 control skeletal muscle samples. We demonstrate the power of RNA-seq to validate candidate splice-disrupting mutations and to identify splice-altering variants in both exonic and deep intronic regions, yielding an overall diagnosis rate of 35%. We also report the discovery of a highly recurrent de novo intronic mutation in COL6A1 that results in a dominantly acting splice-gain event, disrupting the critical glycine repeat motif of the triple helical domain. We identify this pathogenic variant in a total of 27 genetically unsolved patients in an external collagen VI-like dystrophy cohort, thus explaining approximately 25% of patients clinically suggestive of collagen VI dystrophy in whom prior genetic analysis is negative. Overall, this study represents a large systematic application of transcriptome sequencing to rare disease diagnosis and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches.One Sentence SummaryTranscriptome sequencing improves the diagnostic rate for Mendelian disease in patients for whom genetic analysis has not returned a diagnosis.
biorxiv genomics 100-200-users 2016Power Analysis of Single Cell RNA-Sequencing Experiments, bioRxiv, 2016-09-09
AbstractHigh-throughput single cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, and has revealed new cell types, and new insights into developmental process and stochasticity in gene expression. There are now several published scRNA-seq protocols, which all sequence transcriptomes from a minute amount of starting material. Therefore, a key question is how these methods compare in terms of sensitivity of detection of mRNA molecules, and accuracy of quantification of gene expression. Here, we assessed the sensitivity and accuracy of many published data sets based on standardized spike-ins with a uniform raw data processing pipeline. We developed a flexible and fast UMI counting tool (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comvalsumis>httpsgithub.comvalsumis<jatsext-link>) which is compatible with all UMI based protocols. This allowed us to relate these parameters to sequencing depth, and discuss the trade offs between the different methods. To confirm our results, we performed experiments on cells from the same population using three different protocols. We also investigated the effect of RNA degradation on spike-in molecules, and the average efficiency of scRNA-seq on spike-in molecules versus endogenous RNAs.
biorxiv genomics 100-200-users 2016In-field metagenome and 16S rRNA gene amplicon nanopore sequencing robustly characterize glacier microbiota, bioRxiv, 2016-09-08
ABSTRACTIn the field of observation, chance favours only the prepared mind (Pasteur). Impressive developments in genomics have led microbiology to its third “Golden Age”. However, conventional metagenomics strategies necessitate retrograde transfer of samples from extreme or remote environments for later analysis, rendering the powerful insights gained retrospective in nature, striking a contrast with Pasteur’s dictum. Here we implement highly portable USB-based nanopore DNA sequencing platforms coupled with field-adapted environmental DNA extraction, rapid sequence library generation and off-line analyses of shotgun metagenome and 16S ribosomal RNA gene amplicon profiles to characterize microbiota dwelling within cryoconite holes upon Svalbard glaciers, the Greenland Ice Sheet and the Austrian Alps. We show in-field nanopore sequencing of metagenomes captures taxonomic composition of supraglacial microbiota, while 16S rRNA Furthermore, comparison of nanopore data with prior 16S rRNA gene V1-V3 pyrosequencing from the same samples, demonstrates strong correlations between profiles obtained from nanopore sequencing and laboratory based sequencing approaches. gene amplicon sequencing resolves bacterial community responses to habitat changes. Finally, we demonstrate the fidelity and sensitivity of in-field sequencing by analysis of mock communities using field protocols. Ultimately, in-field sequencing potentiated by nanopore devices raises the prospect of enhanced agility in exploring Earth’s most remote microbiomes.
biorxiv microbiology 100-200-users 2016Using high-resolution variant frequencies to empower clinical genome interpretation, bioRxiv, 2016-09-03
ABSTRACTWhole exome and genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognised as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants. Here we present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets. Using the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, and identifies 43 variants previously reported as pathogenic that can now be reclassified. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.
biorxiv genomics 100-200-users 2016