The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only, bioRxiv, 2016-11-02
AbstractThe ability to quickly obtain accurate genome sequences of eukaryotic pathogens at low costs provides a tremendous opportunity to identify novel targets for therapeutics, develop pesticides with increased target specificity and breed for resistance in food crops. Here, we present the first report of the ~54 MB eukaryotic genome sequence of Rhizoctonia solani, an important pathogenic fungal species of maize, using nanopore technology. Moreover, we show that optimizing the strategy for wet-lab procedures aimed to isolate high quality and ultra-pure high molecular weight (HMW) DNA results in increased read length distribution and thereby allowing generation of the most contiguous genome assembly for R. solani to date. We further determined sequencing accuracy and compared the assembly to short-read technologies. With the current sequencing technology and bioinformatics tool set, we are able to deliver an eukaryotic fungal genome at low cost within a week. With further improvements of the sequencing technology and increased throughput of the PromethION sequencer we aim to generate near-finished assemblies of large and repetitive plant genomes and cost-efficiently perform de novo sequencing of large collections of microbial pathogens and the microbial communities that surround our crops.
biorxiv genomics 100-200-users 2016Pavian Interactive analysis of metagenomics data for microbiomics and pathogen identification, bioRxiv, 2016-11-01
AbstractSummaryPavian is a web application for exploring metagenomics classification results, with a special focus on infectious disease diagnosis. Pinpointing pathogens in metagenomics classification results is often complicated by host and laboratory contaminants as well as many non-pathogenic microbiota. With Pavian, researchers can analyze, display and transform results from the Kraken and Centrifuge classifiers using interactive tables, heatmaps and flow diagrams. Pavian also provides an alignment viewer for validation of matches to a particular genome.Availability and implementationPavian is implemented in the R language and based on the Shiny framework. It can be hosted on Windows, Mac OS X and Linux systems, and used with any contemporary web browser. It is freely available under a GPL-3 license from <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.comfbreitwieserpavian>httpgithub.comfbreitwieserpavian<jatsext-link>. Furthermore a Docker image is provided at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpshub.docker.comrflorianbwpavian>httpshub.docker.comrflorianbwpavian<jatsext-link>.Contactfbreitw1@jhu.eduSupplementary informationSupplementary data is available at Bioinformatics online.
biorxiv bioinformatics 100-200-users 2016Leveraging uncertainty information from deep neural networks for disease detection, bioRxiv, 2016-10-29
AbstractDeep learning (DL) has revolutionized the field of computer vision and image processing. In medical imaging, algorithmic solutions based on DL have been shown to achieve high performance on tasks that previously required medical experts. However, DL-based solutions for disease detection have been proposed without methods to quantify and control their uncertainty in a decision. In contrast, a physician knows whether she is uncertain about a case and will consult more experienced colleagues if needed. Here we evaluate drop-out based Bayesian uncertainty measures for DL in diagnosing diabetic retinopathy (DR) from fundus images and show that it captures uncertainty better than straightforward alternatives. Furthermore, we show that uncertainty informed decision referral can improve diagnostic performance. Experiments across different networks, tasks and datasets show robust generalization. Depending on network capacity and taskdataset difficulty, we surpass 85% sensitivity and 80% specificity as recommended by the NHS when referring 0%–20% of the most uncertain decisions for further inspection. We analyse causes of uncertainty by relating intuitions from 2D visualizations to the high-dimensional image space. While uncertainty is sensitive to clinically relevant cases, sensitivity to unfamiliar data samples is task dependent, but can be rendered more robust.
biorxiv bioinformatics 0-100-users 2016Pooled CRISPR screening with single-cell transcriptome read-out, bioRxiv, 2016-10-28
AbstractCRISPR-based genetic screens have revolutionized the search for new gene functions and biological mechanisms. However, widely used pooled screens are limited to simple read-outs of cell proliferation or the production of a selectable marker protein. Arrayed screens allow for more complex molecular read-outs such as transcriptome profiling, but they provide much lower throughput. Here we demonstrate CRISPR genome editing together with single-cell RNA sequencing as a new screening paradigm that combines key advantages of pooled and arrayed screens. This approach allowed us to link guide-RNA expression to the associated transcriptome responses in thousands of single cells using a straightforward and broadly applicable screening workflow.
biorxiv genomics 0-100-users 2016The successor representation in human reinforcement learning, bioRxiv, 2016-10-28
AbstractTheories of reward learning in neuroscience have focused on two families of algorithms, thought to capture deliberative vs. habitual choice. “Model-based” algorithms compute the value of candidate actions from scratch, whereas “model-free” algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation (SR), which balances flexibility and efficiency by storing partially computed action values predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. SR’s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task’s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioral studies with humans. These results suggest that the SR is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.
biorxiv neuroscience 0-100-users 2016FIDDLE An integrative deep learning framework for functional genomic data inference, bioRxiv, 2016-10-19
AbstractNumerous advances in sequencing technologies have revolutionized genomics through generating many types of genomic functional data. Statistical tools have been developed to analyze individual data types, but there lack strategies to integrate disparate datasets under a unified framework. Moreover, most analysis techniques heavily rely on feature selection and data preprocessing which increase the difficulty of addressing biological questions through the integration of multiple datasets. Here, we introduce FIDDLE (Flexible Integration of Data with Deep LEarning) an open source data-agnostic flexible integrative framework that learns a unified representation from multiple data types to infer another data type. As a case study, we use multiple Saccharomyces cerevisiae genomic datasets to predict global transcription start sites (TSS) through the simulation of TSS-seq data. We demonstrate that a type of data can be inferred from other sources of data types without manually specifying the relevant features and preprocessing. We show that models built from multiple genome-wide datasets perform profoundly better than models built from individual datasets. Thus FIDDLE learns the complex synergistic relationship within individual datasets and, importantly, across datasets.
biorxiv bioinformatics 0-100-users 2016