Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads, bioRxiv, 2019-05-11
AbstractThe sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.
biorxiv genomics 0-100-users 2019Jumping To Conclusions, General Intelligence, And Psychosis Liability Findings From The Multi-Centre EU-GEI Case-Control Study, bioRxiv, 2019-05-11
AbstractBackgroundThe “jumping to conclusions” (JTC) bias is associated with both psychosis and general cognition but their relationship is unclear. In this study, we set out to clarify the relationship between the JTC bias, IQ, psychosis and polygenic liability to schizophrenia and IQ.Methods817 FEP patients and 1294 population-based controls completed assessments of general intelligence (IQ), and JTC (assessed by the number of beads drawn on the probabilistic reasoning “beads” task) and provided blood or saliva samples from which we extracted DNA and computed polygenic risk scores for IQ and schizophrenia.ResultsThe estimated proportion of the total effect of casecontrol differences on JTC mediated by IQ was 79%. Schizophrenia Polygenic Risk Score (SZ PRS) was non-significantly associated with a higher number of beads drawn (B= 0.47, 95% CI −0.21 to 1.16, p=0.17); whereas IQ PRS (B=0.51, 95% CI 0.25 to 0.76, p<0.001) significantly predicted the number of beads drawn, and was thus associated with reduced JTC bias. The JTC was more strongly associated with higher level of psychotic-like experiences (PLE) in controls, including after controlling for IQ (B= −1.7, 95% CI −2.8 to −0.5, p=0.006), but did not relate to delusions in patients.Conclusionsthe JTC reasoning bias in psychosis is not a specific cognitive deficit but is rather a manifestation or consequence, of general cognitive impairment. Whereas, in the general population, the JTC bias is related to psychotic-like experiences, independent of IQ. The work has potential to inform interventions targeting cognitive biases in early psychosis.
biorxiv neuroscience 0-100-users 2019Paragraph A graph-based structural variant genotyper for short-read sequence data, bioRxiv, 2019-05-11
AbstractAccurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comIlluminaparagraph>httpsgithub.comIlluminaparagraph<jatsext-link>
biorxiv genomics 100-200-users 2019Early origin and deep conservation of enhancers in animals, bioRxiv, 2019-05-10
AbstractTranscription factors (TFs) bind DNA enhancer sequences to regulate gene transcription in animals. Unlike TFs, the evolution of enhancers has been difficult to trace because of their rapid evolution. Here, we show enhancers from the sponge Amphimedon queenslandica can drive cell type-specific reporter gene expression in zebrafish and mouse, despite sponge and vertebrate lineages diverging over 700 million years ago. Although sponge enhancers, which are present in both highly conserved syntenic gene regions (Islet–Scaper, Ccne1–Uri and Tdrd3–Diaph3) and sponge-specific intergenic regions, have no significant sequence identity with vertebrate genomic sequences, the type and frequency of TF binding motifs in the sponge enhancer allow for the identification of homologous enhancers in bilaterians. Islet enhancers identified in human and mouse Scaper genes drive zebrafish reporter expression patterns that are almost identical to the sponge Islet enhancer. The existence of homologous enhancers in these disparate metazoans suggests animal development is controlled by TF-enhancer DNA interactions that were present in the first multicellular animals.One-sentence summaryEnhancer activity is conserved across 700 million years of trans-phyletic divergence.
biorxiv evolutionary-biology 100-200-users 2019Fishing for mammals landscape-level monitoring of terrestrial and semi-aquatic communities using eDNA from lotic ecosystems, bioRxiv, 2019-05-10
Abstract<jatslist list-type=order><jatslist-item>Environmental DNA (eDNA) metabarcoding has revolutionised biomonitoring in both marine and freshwater ecosystems. However, for semi-aquatic and terrestrial animals, the application of this technique remains relatively untested.<jatslist-item><jatslist-item>We first assess the efficiency of eDNA metabarcoding in detecting semi-aquatic and terrestrial mammals in natural lotic ecosystems in the UK by comparing sequence data recovered from water and sediment samples to the mammalian communities expected from historical data. Secondly, we evaluate the detection efficiency of eDNA samples compared to multiple conventional non-invasive survey methods (latrine surveys and camera trapping) using occupancy modelling.<jatslist-item><jatslist-item>eDNA metabarcoding detected a large proportion of the expected mammalian community within each area. Common species in the areas were detected at the majority of sites. Several key species of conservation concern in the UK were detected by eDNA in areas where authenticated records do not currently exist, but potential false positives were also identified for several non-native species.<jatslist-item><jatslist-item>Water-based eDNA samples provided comparable results to conventional survey methods in per unit of survey effort for three species (water vole, field vole, and red deer) using occupancy models. The comparison between survey ‘effort’ to reach a detection probability of ≥0.95 revealed that 3-6 water replicates would be equivalent to 3-5 latrine surveys and 5-30 weeks of single camera deployment, depending on the species.<jatslist-item><jatslist-item>Synthesis and Applications. eDNA metabarcoding represents an extremely promising tool for monitoring mammals, allowing for the detection of multiple species simultaneously, and provides comparable results to widely-used conventional survey methods. eDNA from freshwater systems delivers a ‘terrestrial dividend’ by detecting both semi-aquatic and terrestrial mammalian communities, and provides a basis for future monitoring at a landscape level over larger spatial and temporal scales (i.e. long-term monitoring at national levels).<jatslist-item>
biorxiv ecology 0-100-users 2019Quantifying genetic regulatory variation in human populations improves transcriptome analysis in rare disease patients, bioRxiv, 2019-05-10
AbstractTranscriptome data holds substantial promise for better interpretation of rare genetic variants in basic research and clinical settings. Here, we introduce ANalysis of Expression VAriation (ANEVA) to quantify genetic variation in gene dosage from allelic expression (AE) data in a population. Application to GTEx data showed that this variance estimate is robust across datasets and is correlated with selective constraint in a gene. We next used ANEVA variance estimates in a Dosage Outlier Test (ANEVA-DOT) to identify genes in an individual that are affected by a rare regulatory variant with an unusually strong effect. Applying ANEVA-DOT to AE data form 70 Mendelian muscular disease patients showed high accuracy in detecting genes with pathogenic variants in previously resolved cases, and lead to one confirmed and several potential new diagnoses in cases previously unresolved. Using our reference estimates from GTEx data, ANEVA-DOT can be readily incorporated in rare disease diagnostic pipelines to better utilize RNA-seq data.One Sentence SummaryNew statistical framework for modelling allelic expression characterizes genetic regulatory variation in populations and informs diagnosis in rare disease patients
biorxiv genomics 0-100-users 2019