Shannon An Information-Optimal de Novo RNA-Seq Assembler, bioRxiv, 2016-02-10
De novo assembly of short RNA-Seq reads into transcripts is challenging due to sequence similarities in transcriptomes arising from gene duplications and alternative splicing of transcripts. We present Shannon, an RNA-Seq assembler with an optimality guarantee derived from principles of information theory Shannon reconstructs nearly all information-theoretically reconstructable transcripts. Shannon is based on a theory we develop for de novo RNA-Seq assembly that reveals differing abundances among transcripts to be the key, rather than the barrier, to effective assembly. The assembly problem is formulated as a sparsest-flow problem on a transcript graph, and the heart of Shannon is a novel iterative flow-decomposition algorithm. This algorithm provably solves the information-theoretically reconstructable instances in linear-time even though the general sparsest-flow problem is NP-hard. Shannon also incorporates several additional new algorithmic advances a new error-correction algorithm based on successive cancelation, a multi-bridging algorithm that carefully utilizes read information in the k-mer de Bruijn graph, and an approximate graph partitioning algorithm to split the transcriptome de Bruijn graph into smaller components. In tests on large RNA-Seq datasets, Shannon obtains significant increases in sensitivity along with improvements in specificity in comparison to state-of-the-art assemblers.
biorxiv genomics 0-100-users 2016Transmission dynamics of Zika virus in island populations a modelling analysis of the 2013-14 French Polynesia outbreak, bioRxiv, 2016-02-08
AbstractBetween October 2013 and April 2014, more than 30,000 cases of Zika virus (ZIKV) disease were estimated to have attended healthcare facilities in French Polynesia. ZIKV has also been reported in Africa and Asia, and in 2015 the virus spread to South America and the Caribbean. Infection with ZIKV has been associated with neurological complications including Guillain-Barré Syndrome (GBS) and microcephaly, which led the World Health Organization to declare a Public Health Emergency of International Concern in February 2015. To better understand the transmission dynamics of ZIKV, we used a mathematical model to examine the 2013–14 outbreak on the six major archipelagos of French Polynesia. Our median estimates for the basic reproduction number ranged from 2.6–4.8, with an estimated 11.5% (95% CI 7.32–17.9%) of total infections reported. As a result, we estimated that 94% (95% CI 91–97%) of the total population of the six archipelagos were infected during the outbreak. Based on the demography of French Polynesia, our results imply that if ZIKV infection provides complete protection against future infection, it would take 12–20 years before there are a sufficient number of susceptible individuals for ZIKV to reemerge, which is on the same timescale as the circulation of dengue virus serotypes in the region. Our analysis suggests that ZIKV may exhibit similar dynamics to dengue virus in island populations, with transmission characterized by large, sporadic outbreaks with a high proportion of asymptomatic or unreported cases.Author SummarySince the first reported major outbreak of Zika virus disease in Micronesia in 2007, the virus has caused outbreaks throughout the Pacific and South America. Transmitted by the Aedes species of mosquitoes, the virus has been linked to possible neurological complications including Guillain-Barre Syndrome and microcephaly. To improve our understanding of the transmission dynamics of Zika virus in island populations, we analysed the 2013–14 outbreak on the six major archipelagos of French Polynesia. We found evidence that Zika virus infected the majority of population, but only around 12% of total infections on the archipelagos were reported as cases. If infection with Zika virus generates lifelong immunity, we estimate that it would take at least 15–20 years before there are enough susceptible people for the virus to reemerge. Our results suggest that Zika virus could exhibit similar dynamics to dengue virus in the Pacific, producing large but sporadic outbreaks in small island populations.
biorxiv ecology 0-100-users 2016Common methods for fecal sample storage in field studies yield consistent signatures of individual identity in microbiome sequencing data, bioRxiv, 2016-02-05
Field studies of wild vertebrates are frequently associated with extensive collections of banked fecal samples, which are often collected from known individuals and sometimes also sampled longitudinally across time. Such collections represent unique resources for understanding ecological, behavioral, and phylogenetic effects on the gut microbiome, especially for species of particular conservation concern. However, we do not understand whether sample storage methods confound the ability to investigate interindividual variation in gut microbiome profiles. This uncertainty arises in part because comparisons across storage methods to date generally include only a few (≤5) individuals, or analyze pooled samples. Here, we used n=52 samples from 13 rhesus macaque individuals to compare immediate freezing, the gold standard of preservation, to three methods commonly used in vertebrate field studies storage in ethanol, lyophilization following ethanol storage, and storage in RNAlater. We found that the signature of individual identity consistently outweighed storage effects alpha diversity and beta diversity measures were significantly correlated across methods, and while samples often clustered by donor, they never clustered by storage method. Provided that all analyzed samples are stored the same way, banked fecal samples therefore appear highly suitable for investigating variation in gut microbiota. Our results open the door to a much-expanded perspective on variation in the gut microbiome across species and ecological contexts.
biorxiv genomics 0-100-users 2016Real time selective sequencing using nanopore technology., bioRxiv, 2016-02-04
The Oxford Nanopore MinION is a portable real time sequencing device which functions by sensing the change in current flow through a nanopore as DNA passes through it. These current values can be streamed in real time from individual nanopores as DNA molecules traverse them. Furthermore, the technology enables individual DNA molecules to be rejected on demand by reversing the voltage across specific channels. In theory, combining these features enables selection of individual DNA molecules for sequencing from a pool, an approach called Read Until. Here we apply dynamic time warping to match short query current traces to references, demonstrating selection of specific regions of small genomes, individual amplicons from a group of targets, or normalisation of amplicons in a set. This is the first demonstration of direct selection of specific DNA molecules in real time whilst sequencing on any device and enables many novel uses for the MinION.
biorxiv genomics 200-500-users 2016INC-Seq Accurate single molecule reads using nanopore sequencing, bioRxiv, 2016-01-28
Nanopore sequencing provides a rapid, cheap and portable real-time sequencing platform with the potential to revolutionize genomics. Several applications, including RNA-seq, haplotype sequencing and 16S sequencing, are however limited by its relatively high single read error rate (>10%). We present INC-Seq (Intramolecular-ligated Nanopore Consensus Sequencing) as a strategy for obtaining long and accurate nanopore reads starting with low input DNA. Applying INC-Seq for 16S rRNA based bacterial profiling generated full-length amplicon sequences with median accuracy >97%. INC-Seq reads enable accurate species-level classification, identification of species at 0.1% abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.
biorxiv genomics 0-100-users 2016Fast and accurate single-cell RNA-Seq analysis by clustering of transcript-compatibility counts, bioRxiv, 2016-01-20
Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling which limit their scope and generality. We propose a novel method that departs from standard analysis pipelines, comparing and clustering cells based not on their transcript or gene quantifications but on their transcript-compatibility read counts. In re-analysis of two landmark yet disparate single-cell RNA-Seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.
biorxiv genomics 0-100-users 2016