Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale, bioRxiv, 2017-04-08
AbstractAlthough shotgun short-read sequencing has facilitated the study of strain-level architecture within complex microbial communities, existing metagenomic approaches often cannot capture structural differences between closely related co-occurring strains. Recent methods, which employ read cloud sequencing and specialized assembly techniques, provide significantly improved genome drafts and show potential to capture these strain-level differences. Here, we apply this read cloud metagenomic approach to longitudinal stool samples from a patient undergoing hematopoietic cell transplantation. The patient’s microbiome is profoundly disrupted and is eventually dominated by Bacteroides caccae. Comparative analysis of B. caccae genomes obtained using read cloud sequencing together with metagenomic RNA sequencing allows us to predict that particular mobile element integrations result in increased antibiotic resistance, which we further support using in vitro antibiotic susceptibility testing. Thus, we find read cloud sequencing to be useful in identifying strain-level differences that underlie differential fitness.
biorxiv bioinformatics 100-200-users 2017The dynamic upper limit of human lifespan, bioRxiv, 2017-04-06
AbstractWe respond to claims by Dong et al. that human lifespan is limited below 125 years. Using the log-linear increase in mortality rates with age to predict the upper limits of human survival we find, in contrast to Dong et al., that the limit to human lifespan is historically flexible and increasing. This discrepancy can be explained by Dong et al.’s use of data with variable sample sizes, age-biased rounding errors, and log(0) instead of log(1) values in linear regressions. Addressing these issues eliminates the proposed 125-year upper limit to human lifespan.
biorxiv physiology 100-200-users 2017Beyond differences in means robust graphical methods to compare two groups in neuroscience, bioRxiv, 2017-03-28
AbstractIf many changes are necessary to improve the quality of neuroscience research, one relatively simple step could have great pay-offs to promote the adoption of detailed graphical methods, combined with robust inferential statistics. Here we illustrate how such methods can lead to a much more detailed understanding of group differences than bar graphs and t-tests on means. To complement the neuroscientist’s toolbox, we present two powerful tools that can help us understand how groups of observations differ the shift function and the difference asymmetry function. These tools can be combined with detailed visualisations to provide complementary perspectives about the data. We provide implementations in R and Matlab of the graphical tools, and all the examples in the article can be reproduced using R scripts.
biorxiv neuroscience 100-200-users 2017Machine Learning-based state-of-the-art methods for the classification of RNA-Seq data, bioRxiv, 2017-03-27
AbstractRNA-Seq measures expression levels of several transcripts simultaneously. The identified reads can be gene, exon, or other region of interest. Various computational tools have been developed for studying pathogen or virus from RNA-Seq data by classifying them according to the attributes in several predefined classes, but still computational tools and approaches to analyze complex datasets are still lacking. The development of classification models is highly recommended for disease diagnosis and classification, disease monitoring at molecular level as well as researching for potential disease biomarkers. In this chapter, we are going to discuss various machine learning approaches for RNA-Seq data classification and their implementation. Advancements in bioinformatics, along with developments in machine learning based classification, would provide powerful toolboxes for classifying transcriptome information available through RNA-Seq data.
biorxiv bioinformatics 100-200-users 2017Assembling metagenomes, one community at a time, bioRxiv, 2017-03-25
AbstractBackgroundMetagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data.ResultsTo assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours.ConclusionsWe found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
biorxiv bioinformatics 100-200-users 2017