De novo Identification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing, bioRxiv, 2016-12-16
AbstractAdvances in nanopore sequencing technology have enabled investigation of the full catalogue of covalent DNA modifications. We present the first algorithm for the identification of modified nucleotides without the need for prior training data along with the open source software implementation, nanoraw. Nanoraw accurately assigns contiguous raw nanopore signal to genomic positions, enabling novel data visualization, and increasing power and accuracy for the discovery of covalently modified bases in native DNA. Ground truth case studies utilizing synthetically methylated DNA show the capacity to identify three distinct methylation marks, 4mC, 5mC, and 6mA, in seven distinct sequence contexts without any changes to the algorithm. We demonstrate quantitative reproducibility simultaneously identifying 5mC and 6mA in native E. coli across biological replicates processed in different labs. Finally we propose a pipeline for the comprehensive discovery of DNA modifications in any genome without a priori knowledge of their chemical identities.
biorxiv bioinformatics 100-200-users 2016Genetic determinants of chromatin accessibility in T cell activation across humans, bioRxiv, 2016-12-03
AbstractOver 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) and RNA-seq profiles from activated CD4+ T cells of up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, in patterns consistent with the 3D organization of chromosomes measured by in situ Hi-C in T cells. 15% of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak through disrupting binding sites for transcription factors important for T cell differentiation and activation. These ATAC quantitative trait nucleotides (ATAC-QTNs) have the largest effects on co-accessible peaks, are associated with gene expression from the same aliquot of cells, are rarely affecting core binding motifs, and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis- regulatory elements, in isolation or in concert, to influence gene expression in primary immune cells that play a key role in many human diseases.
biorxiv genomics 100-200-users 2016Cryo-EM structure of haemoglobin at 3.2 Å determined with the Volta phase plate, bioRxiv, 2016-11-18
With the advent of direct electron detectors, the perspectives of cryo-electron microscopy (cryo-EM) have changed in a profound way1. These cameras are superior to previous detectors in coping with the intrinsically low contrast of radiation-sensitive organic materials embedded in amorphous ice, and so they have enabled the structure determination of several macromolecular assemblies to atomic or near-atomic resolution. According to one theoretical estimation, a few thousand images should suffice for calculating the structure of proteins as small as 17 kDa at 3 Å resolution2. In practice, however, we are still far away from this theoretical ideal. Thus far, protein complexes that have been successfully reconstructed to high-resolution by single particle analysis (SPA) have molecular weights of ~100 kDa or larger3. Here, we report the use of Volta phase plate in determining the structure of human haemoglobin (64 kDa) at 3.2 Å. Our results demonstrate that this method can be applied to complexes that are significantly smaller than those previously studied by conventional defocus-based approaches. Cryo-EM is now close to becoming a fast and cost-effective alternative to crystallography for high-resolution protein structure determination.
biorxiv biophysics 100-200-users 2016The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only, bioRxiv, 2016-11-02
AbstractThe ability to quickly obtain accurate genome sequences of eukaryotic pathogens at low costs provides a tremendous opportunity to identify novel targets for therapeutics, develop pesticides with increased target specificity and breed for resistance in food crops. Here, we present the first report of the ~54 MB eukaryotic genome sequence of Rhizoctonia solani, an important pathogenic fungal species of maize, using nanopore technology. Moreover, we show that optimizing the strategy for wet-lab procedures aimed to isolate high quality and ultra-pure high molecular weight (HMW) DNA results in increased read length distribution and thereby allowing generation of the most contiguous genome assembly for R. solani to date. We further determined sequencing accuracy and compared the assembly to short-read technologies. With the current sequencing technology and bioinformatics tool set, we are able to deliver an eukaryotic fungal genome at low cost within a week. With further improvements of the sequencing technology and increased throughput of the PromethION sequencer we aim to generate near-finished assemblies of large and repetitive plant genomes and cost-efficiently perform de novo sequencing of large collections of microbial pathogens and the microbial communities that surround our crops.
biorxiv genomics 100-200-users 2016Pavian Interactive analysis of metagenomics data for microbiomics and pathogen identification, bioRxiv, 2016-11-01
AbstractSummaryPavian is a web application for exploring metagenomics classification results, with a special focus on infectious disease diagnosis. Pinpointing pathogens in metagenomics classification results is often complicated by host and laboratory contaminants as well as many non-pathogenic microbiota. With Pavian, researchers can analyze, display and transform results from the Kraken and Centrifuge classifiers using interactive tables, heatmaps and flow diagrams. Pavian also provides an alignment viewer for validation of matches to a particular genome.Availability and implementationPavian is implemented in the R language and based on the Shiny framework. It can be hosted on Windows, Mac OS X and Linux systems, and used with any contemporary web browser. It is freely available under a GPL-3 license from <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.comfbreitwieserpavian>httpgithub.comfbreitwieserpavian<jatsext-link>. Furthermore a Docker image is provided at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpshub.docker.comrflorianbwpavian>httpshub.docker.comrflorianbwpavian<jatsext-link>.Contactfbreitw1@jhu.eduSupplementary informationSupplementary data is available at Bioinformatics online.
biorxiv bioinformatics 100-200-users 2016Reconstructing cell cycle and disease progression using deep learning, bioRxiv, 2016-10-19
AbstractWe show that deep convolutional neural networks combined with non-linear dimension reduction enable reconstructing biological processes based on raw image data. We demonstrate this by recon-structing the cell cycle of Jurkat cells and disease progression in diabetic retinopathy. In further analysis of Jurkat cells, we detect and separate a subpopulation of dead cells in an unsupervised manner and, in classifying discrete cell cycle stages, we reach a 6-fold reduction in error rate compared to a recent approach based on boosting on image features. In contrast to previous methods, deep learning based predictions are fast enough for on-the-fly analysis in an imaging flow cytometer.
biorxiv bioinformatics 100-200-users 2016