Highly parallel direct RNA sequencing on an array of nanopores, bioRxiv, 2016-08-13
AbstractRibonucleic acid sequencing can allow us to monitor the RNAs present in a sample. This enables us to detect the presence and nucleotide sequence of viruses, or to build a picture of how active transcriptional processes are changing – information that is useful for understanding the status and function of a sample. Oxford Nanopore Technologies’ sequencing technology is capable of electronically analysing a sample’s DNA directly, and in real-time. In this manuscript we demonstrate the ability of an array of nanopores to sequence RNA directly, and we apply it to a range of biological situations. Nanopore technology is the only available sequencing technology that can sequence RNA directly, rather than depending on reverse transcription and PCR. There are several potential advantages of this approach over other RNA-seq strategies, including the absence of amplification and reverse transcription biases, the ability to detect nucleotide analogues and the ability to generate full-length, strand-specific RNA sequences. Direct RNA sequencing is a completely new way of analysing the sequence of RNA samples and it will improve the ease and speed of RNA analysis, while yielding richer biological information.
biorxiv genomics 100-200-users 2016recount A large-scale resource of analysis-ready RNA-seq expression data, bioRxiv, 2016-08-09
Abstractrecount is a resource of processed and summarized expression data spanning nearly 60,000 human RNA-seq samples from the Sequence Read Archive (SRA). The associated recount Bio-conductor package provides a convenient API for querying, downloading, and analyzing the data. Each processed study consists of metaphenotype data, the expression levels of genes and their underlying exons and splice junctions, and corresponding genomic annotation. We also provide data summarization types for quantifying novel transcribed sequence including base-resolution coverage and potentially unannotated splice junctions. We present workflows illustrating how to use recount to perform differential expression analysis including meta-analysis, annotation-free base-level analysis, and replication of smaller studies using data from larger studies. recount provides a valuable and user-friendly resource of processed RNA-seq datasets to draw additional biological insights from existing public data. The resource is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsjhubiostatistics.shinyapps.iorecount>httpsjhubiostatistics.shinyapps.iorecount<jatsext-link>.
biorxiv genomics 100-200-users 2016Cytoarchitectonic similarity is a wiring principle of the human connectome, bioRxiv, 2016-08-07
AbstractUnderstanding the wiring diagram of the human cerebral cortex is a fundamental challenge in neuroscience. Elemental aspects of its organization remain elusive. Here we examine which structural traits of cortical regions, particularly their cytoarchitecture and thickness, relate to the existence and strength of inter-regional connections. We use the architecture data from the classic work of von Economo and Koskinas and state-of-the-art diffusion-based connectivity data from the Human Connectome Project. Our results reveal a prominent role of the cytoarchitectonic similarity of supragranular layers for predicting the existence and strength of connections. In contrast, cortical thickness similarity was not related to the existence or strength of connections. These results are in line with findings for non-human mammalian cerebral cortices, suggesting overarching wiring principles of the mammalian cerebral cortex. The results invite hypotheses about evolutionary conserved neurobiological mechanisms that give rise to the relation of cytoarchitecture and connectivity in the human cerebral cortex.
biorxiv neuroscience 100-200-users 2016Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm, bioRxiv, 2016-07-27
AbstractLong sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and highly repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.
biorxiv bioinformatics 100-200-users 2016Massively parallel digital transcriptional profiling of single cells, bioRxiv, 2016-07-27
ABSTRACTCharacterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of up to tens of thousands of single cells per sample. Cell encapsulation in droplets takes place in ∼6 minutes, with ∼50% cell capture efficiency, up to 8 samples at a time. The speed and efficiency allow the processing of precious samples while minimizing stress to cells. To demonstrate the system′s technical performance and its applications, we collected transcriptome data from ∼¼ million single cells across 29 samples. First, we validate the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. Then, we profile 68k peripheral blood mononuclear cells (PBMCs) to demonstrate the system′s ability to characterize large immune populations. Finally, we use sequence variation in the transcriptome data to determine host and donor chimerism at single cell resolution in bone marrow mononuclear cells (BMMCs) of transplant patients. This analysis enables characterization of the complex interplay between donor and host cells and monitoring of treatment response. This high-throughput system is robust and enables characterization of diverse biological systems with single cell mRNA analysis.
biorxiv genomics 100-200-users 2016Rapid and efficient analysis of 20,000 RNA-seq samples with Toil, bioRxiv, 2016-07-08
ABSTRACTToil is portable, open-source workflow software that supports contemporary workflow definition languages and can be used to securely and reproducibly run scientific workflows efficiently at large-scale. To demonstrate Toil, we processed over 20,000 RNA-seq samples to create a consistent meta-analysis of five datasets free of computational batch effects that we make freely available. Nearly all the samples were analysed in under four days using a commercial cloud cluster of 32,000 preemptable cores.
biorxiv bioinformatics 100-200-users 2016