NanoAmpli-Seq A workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform, bioRxiv, 2018-01-08
AbstractBackgroundAmplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity, but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences platforms overcome this limitation, their application has been limited due to higher error rates or smaller data output.ResultsIn this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the aforementioned protocol that reduces sample-processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain full-length 16S rRNA gene sequences.ConclusionsNanoAmpli-Seq accurately estimates the diversity of tested mock communities with average sequence accuracy of 99.5% for 2D and 1D2 sequencing on the nanopore sequencing platform. Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymer aware basecalling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms.
biorxiv microbiology 100-200-users 2018The essential genome of Escherichia coli K-12, bioRxiv, 2017-12-22
ABSTRACTTransposon-Directed Insertion-site Sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries and therefore it remains unclear whether the two methodologies are comparable. To address this, a high density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false positive identification of essential gene candidates, statistical data analysis included corrections for both gene length and genome length. Through this analysis new essential genes and genes previously incorrectly designated as essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects and fine resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis datasets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry.IMPORTANCEIncentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in E. coli, we constructed a very high density transposon mutant library. Initial automated analysis of the resulting data revealed many discrepancies when compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high density TraDIS sequencing data for each putative essential gene for the model laboratory organism, Escherichia coli. This paper is important because it provides a better understanding of the essential genes of E. coli, reveals the limitations of relying on automated analysis alone and a provides new standard for the analysis of TraDIS data.
biorxiv microbiology 100-200-users 2017Amplicon sequencing of the 16S-ITS-23S rRNA operon with long-read technology for improved phylogenetic classification of uncultured prokaryotes, bioRxiv, 2017-12-16
AbstractAmplicon sequencing of the 16S rRNA gene is the predominant method to quantify microbial compositions of environmental samples and to discover previously unknown lineages. Its unique structure of interspersed conserved and variable regions is an excellent target for PCR and allows for classification of reads at all taxonomic levels. However, the relatively few phylogenetically informative sites prevent confident phylogenetic placements of novel lineages that are deep branching relative to reference taxa. This problem is exacerbated when only short 16S rRNA gene fragments are sequenced. To resolve their placement, it is common practice to gather more informative sites by combining multiple conserved genes into concatenated datasets. This however requires genomic data which may be obtained through relatively expensive metagenome sequencing and computationally demanding analyses. Here we develop a protocol that amplifies a large part of 16S and 23S rRNA genes within the rRNA operon, including the ITS region, and sequences the amplicons with PacBio long-read technology. We tested our method with a synthetic mock community and developed a read curation pipeline that reduces the overall error rate to 0.18%. Applying our method on four diverse environmental samples, we were able to capture near full-length rRNA operon amplicons from a large diversity of prokaryotes. Phylogenetic trees constructed with these sequences showed an increase in statistical support compared to trees inferred with shorter, Illumina-like sequences using only the 16S rRNA gene (250 bp). Our method is a cost-effective solution to generate high quality, near full-length 16S and 23S rRNA gene sequences from environmental prokaryotes.
biorxiv microbiology 0-100-users 2017Rarefaction, alpha diversity, and statistics, bioRxiv, 2017-12-12
AbstractUnderstanding the drivers of microbial diversity is a fundamental question in microbial ecology. Extensive literature discusses different methods for describing microbial diversity and documenting its effects on ecosystem function. However, it is widely believed that diversity depends on the number of reads that are sequenced. I discuss a statistical perspective on diversity, framing the diversity of an environment as an unknown parameter, and discussing the bias and variance of plug-in and rarefied estimates. I argue that by failing to account for both bias and variance, we invalidate analysis of alpha diversity. I describe the state of the statistical literature for addressing these problems, and suggest that measurement error modeling can address issues with variance, but bias corrections need to be utilized as well. I encourage microbial ecologists to avoid motivating their investigations with alpha diversity analyses that do not use valid statistical methodology.
biorxiv microbiology 0-100-users 2017The rust fungus Melampsora larici-populina expresses a conserved genetic program and distinct sets of secreted protein genes during infection of its two host plants, larch and poplar, bioRxiv, 2017-12-07
SummaryMechanims required for broad spectrum or specific host colonization of plant parasites are poorly understood. As a perfect illustration, heteroecious rust fungi require two alternate host plants to complete their life cycle. Melampsora larici-populina infects two taxonomically unrelated plants, larch on which sexual reproduction is achieved and poplar on which clonal multiplication occurs leading to severe epidemics in plantations. High-depth RNA sequencing was applied to three key developmental stages of M. larici-populina infection on larch basidia, pycnia and aecia. Comparative transcriptomics of infection on poplar and larch hosts was performed using available expression data. Secreted protein was the only significantly over-represented category among differentially expressed M. larici-populina genes in basidia, pycnia and aecia compared together, highlighting their probable involvement in the infection process. Comparison of fungal transcriptomes in larch and poplar revealed a majority of rust genes commonly expressed on the two hosts and a fraction exhibiting a host-specific expression. More particularly, gene families encoding small secreted proteins presented striking expression profiles that highlight probable candidate effectors specialized on each host. Our results bring valuable new information about the biological cycle of rust fungi and identify genes that may contribute to host specificity.
biorxiv microbiology 0-100-users 2017Real-time analysis of nanopore-based metagenomic sequencing from orthopaedic device infection, bioRxiv, 2017-11-18
AbstractProsthetic joint infections are clinically difficult to diagnose and treat. Previously, we demonstrated metagenomic sequencing on an Illumina MiSeq replicates the findings of current gold standard microbiological diagnostic techniques. Nanopore sequencing offers advantages in speed of detection over MiSeq. Here, we compare direct-from-clinical-sample metagenomic Illumina sequencing with Nanopore sequencing, and report a real-time analytical pathway for Nanopore sequence data, designed for detecting bacterial composition of prosthetic joint infections.DNA was extracted from the sonication fluids of seven explanted orthopaedic devices, and additionally from two culture negative controls, and was sequenced on the Oxford Nanopore Technologies MinION platform. A specific analysis pipeline was assembled to overcome the challenges of identifying the true infecting pathogen, given high levels of host contamination and unavoidable background lab and kit contamination.The majority of DNA classified (>90%) was host contamination and discarded. Using negative control filtering thresholds, the species identified corresponded with both routine microbiological diagnosis and MiSeq results. By analysing sequences in real time, causes of infection were robustly detected within minutes from initiation of sequencing.We demonstrate initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections. We demonstrate a novel, scalable pipeline for real-time analysis of MinION sequence data. The high proportion of human DNA in extracts prevents full genome analysis from complete coverage, and methods to reduce this could increase genome depth and allow antimicrobial resistance profiling.
biorxiv microbiology 100-200-users 2017