Transcript expression-aware annotation improves rare variant discovery and interpretation, bioRxiv, 2019-02-19
The acceleration of DNA sequencing in patients and population samples has resulted in unprecedented catalogues of human genetic variation, but the interpretation of rare genetic variants discovered using such technologies remains extremely challenging. A striking example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Through manual curation of putative loss of function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD), we show that one explanation for this paradox involves alternative mRNA splicing, which allows exons of a gene to be expressed at varying levels across cell types. Currently, no existing annotation tool systematically incorporates this exon expression information into variant interpretation. Here, we develop a transcript-level annotation metric, the proportion expressed across transcripts (pext), which summarizes isoform quantifications for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression project (GTEx) and show that it clearly differentiates between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.4% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder (ASD) and developmental disorders and intellectual disability (DDID) to show that pLoF variants in weakly expressed regions have effect sizes similar to those of synonymous variants, while pLoF variants in highly expressed exons are most strongly enriched among cases versus controls. Our annotation is fast, flexible, and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for rare disease diagnosis, rare variant burden analyses in complex disorders, and curation and prioritization of variants in recall-by-genotype studies.
biorxiv genomics 200-500-users 2019Characterising the loss-of-function impact of 5' untranslated region variants in whole genome sequence data from 15,708 individuals, bioRxiv, 2019-02-08
Upstream open reading frames (uORFs) are important tissue-specific cis-regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.
biorxiv genomics 200-500-users 2019Characterising the loss-of-function impact of 5’ untranslated region variants in whole genome sequence data from 15,708 individuals, bioRxiv, 2019-02-08
AbstractUpstream open reading frames (uORFs) are important tissue-specific cis-regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.
biorxiv genomics 200-500-users 2019Rotary substates of mitochondrial ATP synthase reveal the basis of flexible F1-Fo coupling, bioRxiv, 2019-02-07
F1Fo-ATP synthases play a central role in cellular metabolism, making the energy of the proton-motive force across a membrane available for a large number of energy-consuming processes. We determined the single-particle cryo-EM structure of active dimeric ATP synthase from mitochondria of Polytomella sp. at 2.7- 2.8 Å resolution. Separation of 13 well-defined rotary substates by 3D classification provides a detailed picture of the molecular motions that accompany c-ring rotation and result in ATP synthesis. Crucially, the F1 head rotates along with the central stalk and c-ring rotor for the first ~30° of each 120° primary rotary step. The joint movement facilitates flexible coupling of the stoichiometrically mismatched F1 and Fo subcomplexes. Flexibility is mediated primarily by the interdomain hinge of the conserved OSCP subunit, a well-established target of physiologically important inhibitors. Our maps provide atomic detail of the c-ringa-subunit interface in the membrane, where protonation and deprotonation of c-ring cGlu111 drives rotary catalysis. An essential histidine residue in the lumenal proton access channel binds a strong non-peptide density assigned to a metal ion that may facilitate c-ring protonation, as its coordination geometry changes with c-ring rotation. We resolve ordered water molecules in the proton access and release channels and at the gating aArg239 that is critical in all rotary ATPases. We identify the previously unknown ASA10 subunit and present complete de novo atomic models of subunits ASA1-10, which make up the two interlinked peripheral stalks that stabilize the Polytomella ATP synthase dimer.
biorxiv biochemistry 200-500-users 2019Deep learning reveals cancer metastasis and therapeutic antibody targeting in whole body, bioRxiv, 2019-02-06
Reliable detection of disseminated tumor cells and of the biodistribution of tumor-targeting therapeutic antibodies within the entire body has long been needed to better understand and treat cancer metastasis. Here, we developed an integrated pipeline for automated quantification of cancer metastases and therapeutic antibody targeting, named DeepMACT. First, we enhanced the fluorescent signal of tumor cells more than 100-fold by applying the vDISCO method to image single cancer cells in intact transparent mice. Second, we developed deep learning algorithms for automated quantification of metastases with an accuracy matching human expert manual annotation. Deep learning-based quantifications in a model of spontaneous metastasis using human breast cancer cells allowed us to systematically analyze clinically relevant features such as size, shape, spatial distribution, and the degree to which metastases are targeted by a therapeutic monoclonal antibody in whole mice. DeepMACT can thus considerably improve the discovery of effective therapeutic strategies for metastatic cancer.
biorxiv cancer-biology 200-500-users 2019