Paragraph A graph-based structural variant genotyper for short-read sequence data, bioRxiv, 2019-05-11
AbstractAccurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comIlluminaparagraph>httpsgithub.comIlluminaparagraph<jatsext-link>
biorxiv genomics 100-200-users 2019Early origin and deep conservation of enhancers in animals, bioRxiv, 2019-05-10
AbstractTranscription factors (TFs) bind DNA enhancer sequences to regulate gene transcription in animals. Unlike TFs, the evolution of enhancers has been difficult to trace because of their rapid evolution. Here, we show enhancers from the sponge Amphimedon queenslandica can drive cell type-specific reporter gene expression in zebrafish and mouse, despite sponge and vertebrate lineages diverging over 700 million years ago. Although sponge enhancers, which are present in both highly conserved syntenic gene regions (Islet–Scaper, Ccne1–Uri and Tdrd3–Diaph3) and sponge-specific intergenic regions, have no significant sequence identity with vertebrate genomic sequences, the type and frequency of TF binding motifs in the sponge enhancer allow for the identification of homologous enhancers in bilaterians. Islet enhancers identified in human and mouse Scaper genes drive zebrafish reporter expression patterns that are almost identical to the sponge Islet enhancer. The existence of homologous enhancers in these disparate metazoans suggests animal development is controlled by TF-enhancer DNA interactions that were present in the first multicellular animals.One-sentence summaryEnhancer activity is conserved across 700 million years of trans-phyletic divergence.
biorxiv evolutionary-biology 100-200-users 2019Fishing for mammals landscape-level monitoring of terrestrial and semi-aquatic communities using eDNA from lotic ecosystems, bioRxiv, 2019-05-10
Abstract<jatslist list-type=order><jatslist-item>Environmental DNA (eDNA) metabarcoding has revolutionised biomonitoring in both marine and freshwater ecosystems. However, for semi-aquatic and terrestrial animals, the application of this technique remains relatively untested.<jatslist-item><jatslist-item>We first assess the efficiency of eDNA metabarcoding in detecting semi-aquatic and terrestrial mammals in natural lotic ecosystems in the UK by comparing sequence data recovered from water and sediment samples to the mammalian communities expected from historical data. Secondly, we evaluate the detection efficiency of eDNA samples compared to multiple conventional non-invasive survey methods (latrine surveys and camera trapping) using occupancy modelling.<jatslist-item><jatslist-item>eDNA metabarcoding detected a large proportion of the expected mammalian community within each area. Common species in the areas were detected at the majority of sites. Several key species of conservation concern in the UK were detected by eDNA in areas where authenticated records do not currently exist, but potential false positives were also identified for several non-native species.<jatslist-item><jatslist-item>Water-based eDNA samples provided comparable results to conventional survey methods in per unit of survey effort for three species (water vole, field vole, and red deer) using occupancy models. The comparison between survey ‘effort’ to reach a detection probability of ≥0.95 revealed that 3-6 water replicates would be equivalent to 3-5 latrine surveys and 5-30 weeks of single camera deployment, depending on the species.<jatslist-item><jatslist-item>Synthesis and Applications. eDNA metabarcoding represents an extremely promising tool for monitoring mammals, allowing for the detection of multiple species simultaneously, and provides comparable results to widely-used conventional survey methods. eDNA from freshwater systems delivers a ‘terrestrial dividend’ by detecting both semi-aquatic and terrestrial mammalian communities, and provides a basis for future monitoring at a landscape level over larger spatial and temporal scales (i.e. long-term monitoring at national levels).<jatslist-item>
biorxiv ecology 0-100-users 2019Quantifying genetic regulatory variation in human populations improves transcriptome analysis in rare disease patients, bioRxiv, 2019-05-10
AbstractTranscriptome data holds substantial promise for better interpretation of rare genetic variants in basic research and clinical settings. Here, we introduce ANalysis of Expression VAriation (ANEVA) to quantify genetic variation in gene dosage from allelic expression (AE) data in a population. Application to GTEx data showed that this variance estimate is robust across datasets and is correlated with selective constraint in a gene. We next used ANEVA variance estimates in a Dosage Outlier Test (ANEVA-DOT) to identify genes in an individual that are affected by a rare regulatory variant with an unusually strong effect. Applying ANEVA-DOT to AE data form 70 Mendelian muscular disease patients showed high accuracy in detecting genes with pathogenic variants in previously resolved cases, and lead to one confirmed and several potential new diagnoses in cases previously unresolved. Using our reference estimates from GTEx data, ANEVA-DOT can be readily incorporated in rare disease diagnostic pipelines to better utilize RNA-seq data.One Sentence SummaryNew statistical framework for modelling allelic expression characterizes genetic regulatory variation in populations and informs diagnosis in rare disease patients
biorxiv genomics 0-100-users 2019Shake-it-off A simple ultrasonic cryo-EM specimen preparation device, bioRxiv, 2019-05-10
AbstractAlthough microscopes and image analysis software for electron cryomicroscopy (cryo-EM) have improved dramatically in recent years, specimen preparation methods have lagged behind. Most strategies still rely on blotting microscope grids with paper to produce a thin film of solution suitable for vitrification. This approach loses more than 99.9% of the applied sample and requires several seconds, leading to problematic air-water interface interactions for macromolecules in the resulting thin film of solution and complicating time-resolved studies. Recently developed self-wicking EM grids allow use of small volumes of sample, with nanowires on the grid bars removing excess solution to produce a thin film within tens of milliseconds from sample application to freezing. Here we present a simple cryo-EM specimen preparation device that uses components from an ultrasonic humidifier to transfer protein solution onto a self-wicking EM grid. The device is controlled by a Raspberry Pi single board computer and all components are either widely available or can be manufactured by online services, allowing the device to be constructed in laboratories that specialize in cryo-EM, rather than instrument design. The simple open-source design permits straightforward customization of the instrument for specialized experiments.SynopsisA method is presented for high-speed low-volume cryo-EM specimen preparation with a device constructed from readily available components.
biorxiv biophysics 0-100-users 2019Stress-driven transposable element de-repression dynamics in a fungal pathogen, bioRxiv, 2019-05-10
AbstractTransposable elements (TEs) are drivers of genome evolution and affect the expression landscape of the host genome. Stress is a major factor inducing TE activity, however the regulatory mechanisms underlying de-repression are poorly understood. Key unresolved questions are whether different types of stress differentially induce TE activity and whether different TEs respond differently to the same stress. Plant pathogens are excellent models to dissect the impact of stress on TEs, because lifestyle transitions on and off the host impose exposure to a variety of stress conditions. We analyzed the TE expression landscape of four well-characterized strains of the major wheat pathogen Zymoseptoria tritici. We experimentally exposed strains to nutrient starvation and host infection stress. Contrary to expectations, we show that the two distinct conditions induce the expression of different sets of TEs. In particular, the most highly expressed TEs, including MITE and LTR-Gypsy elements, show highly distinct de-repression across stress conditions. Both the genomic context of TEs and the genetic background stress (i.e. different strains harboring the same TEs) were major predictors of de-repression dynamics under stress. Genomic defenses inducing point mutations in repetitive regions were largely ineffective to prevent TE de-repression. Consistent with TE de-repression being governed by epigenetic effects, we found that gene expression profiles under stress varied significantly depending on the proximity to the closest TEs. The unexpected complexity in TE responsiveness to stress across genetic backgrounds and genomic locations shows that species harbor substantial genetic variation to control TEs.
biorxiv genomics 0-100-users 2019