Comparative assessment of long-read error-correction software applied to RNA-sequencing data, bioRxiv, 2018-11-23

AbstractMotivationLong-read sequencing technologies offer promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However these technologies are currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of RNA-sequencing long reads remain limited.ResultsIn this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type.Benchmarking software<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgitlab.comleoislLR_EC_analyser>httpsgitlab.comleoislLR_EC_analyser<jatsext-link>

biorxiv bioinformatics 0-100-users 2018

Tracing diagnosis trajectories over millions of inpatients reveal an unexpected association between schizophrenia and rhabdomyolysis, bioRxiv, 2018-11-20

AbstractWhile it has been technically feasible to create longitudinal representations of individual health at a nationwide scale, the use of these techniques to identify novel disease associations for the risk stratification of patients has had limited success. Here, we created a large-scale US longitudinal disease network of traced readmission patterns (i.e., disease trajectories), merging data from over 10.4 million inpatients from 350 California hospitals through the Healthcare Cost and Utilization Project between 1980 and 2010. We were able to create longitudinal representations of disease progression mapping over 300 common diseases, including the well-known complication of heart failure after acute myocardial infarction. Surprisingly, out of these generated disease trajectories, we discovered an unknown association between schizophrenia, a chronic mental disorder, and rhabdomyolysis, a rare disease of muscle breakdown. It was found that 92 of 3674 patients (2.5%) with schizophrenia were readmitted for rhabdomyolysis (relative risk, 2.21 [1.80–2.71, confidence interval = 0.95] P-value 9.54E-15), which has a general population incidence of 1 in 10,000. We validated this association using independent electronic health records from over 830,000 patients treated over seven years at the University of California, San Francisco (UCSF) medical center. A case review of 29 patients at UCSF who were treated for schizophrenia and who went on to develop rhabdomyolysis demonstrated that the majority of cases (62%) are idiopathic, which suggests a biological connection between these two diseases. Together, these findings demonstrate the power of using public disease registries in combination with electronic medical records to discover novel disease associations.One Sentence SummaryBased on the longitudinal health records from millions of California inpatient discharges, we created a temporal network that enabled us to understand statewide patterns of hospital readmissions, which led to the novel finding that hospitalization for schizophrenia is significantly associated with rehospitalization for rhabdomyolysis.

biorxiv bioinformatics 0-100-users 2018

AnnoTree visualization and exploration of a functionally annotated microbial tree of life, bioRxiv, 2018-11-06

AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 geneprotein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infectiondefense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.

biorxiv bioinformatics 100-200-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo