Nanopore native RNA sequencing of a human poly(A) transcriptome, bioRxiv, 2018-11-10
ABSTRACTHigh throughput cDNA sequencing technologies have dramatically advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and because modifications are not carried forward in cDNA. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies (ONT). Our study focused on poly(A) RNA from the human cell line GM12878, generating 9.9 million aligned sequence reads. These native RNA reads had an aligned N50 length of 1294 bases, and a maximum aligned length of over 21,000 bases. A total of 78,199 high-confidence isoforms were identified by combining long nanopore reads with short higher accuracy Illumina reads. We describe strategies for assessing 3′ poly(A) tail length, base modifications and transcript haplotypes from nanopore RNA data. Together, these nanopore-based techniques are poised to deliver new insights into RNA biology.DISCLOSURESMA holds shares in Oxford Nanopore Technologies (ONT). MA is a paid consultant to ONT. REW, WT, TG, JRT, JQ, NJL, JTS, NS, AB, MA, HEO, MJ, and ML received reimbursement for travel, accommodation and conference fees to speak at events organised by ONT. NL has received an honorarium to speak at an ONT company meeting. WT has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore. JTS, ML and MA received research funding from ONT.
biorxiv genomics 200-500-users 2018Genomic architecture and introgression shape a butterfly radiation, bioRxiv, 2018-11-09
We here pioneer a low-cost assembly strategy for 20 Heliconiini genomes to characterize the evolutionary history of the rapidly radiating genus Heliconius. A bifurcating tree provides a poor fit to the data, and we therefore explore a reticulate phylogeny for Heliconius. We probe the genomic architecture of gene flow, and develop a new method to distinguish incomplete lineage sorting from introgression. We find that most loci with non-canonical histories arose through introgression, and are strongly underrepresented in regions of low recombination and high gene density. This is expected if introgressed alleles are more likely to be purged in such regions due to tighter linkage with incompatibility loci. Finally, we identify a hitherto unrecognized inversion, and show it is a convergent structural rearrangement that captures a known color pattern switch locus within the genus. Our multi-genome assembly approach enables an improved understanding of adaptive radiation.
biorxiv evolutionary-biology 200-500-users 2018OrthoFinder phylogenetic orthology inference for comparative genomics, bioRxiv, 2018-11-08
AbstractHere, we present a major advance of the OrthoFinder method. This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted genes trees, gene duplication events, the rooted species tree, and comparative genomic statistics. Each output is benchmarked on appropriate real or simulated datasets and, where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder’s comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comdavidemmsOrthoFinder>httpsgithub.comdavidemmsOrthoFinder<jatsext-link>.
biorxiv bioinformatics 200-500-users 2018Ultra-sensitive sequencing for cancer detection reveals progressive clonal selection in normal tissue over a century of human lifespan, bioRxiv, 2018-11-04
ABSTRACTHigh accuracy next-generation DNA sequencing promises a paradigm shift in early cancer detection by enabling the identification of mutant cancer molecules in minimally-invasive body fluid samples. We demonstrate 80% sensitivity for ovarian cancer detection using ultra-accurate Duplex Sequencing to identify TP53 mutations in uterine lavage. However, in addition to tumor DNA, we also detect low frequency TP53 mutations in nearly all lavages from women with and without cancer. These mutations increase with age and share the selection traits of clonal TP53 mutations commonly found in human tumors. We show that low frequency TP53 mutations exist in multiple healthy tissues, from newborn to centenarian, and progressively increase in abundance and pathogenicity with older age across tissue types. Our results illustrate that subclonal cancer evolutionary processes are a ubiquitous part of normal human aging and great care must be taken to distinguish tumor-derived, from age-associated mutations in high sensitivity clinical cancer diagnostics.
biorxiv cancer-biology 200-500-users 2018Comprehensive integration of single cell data, bioRxiv, 2018-11-02
Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability Installation instructions, documentation, and tutorials are available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.satijalab.orgseurat>httpswww.satijalab.orgseurat<jatsext-link>
biorxiv genomics 200-500-users 2018Inferring the ancestry of everyone, bioRxiv, 2018-11-01
AbstractA central problem in evolutionary biology is to infer the full genealogical history of a set of DNA sequences. This history contains rich information about the forces that have influenced a sexually reproducing species. However, existing methods are limited the most accurate is unable to cope with more than a few dozen samples. With modern genetic data sets rapidly approaching millions of genomes, there is an urgent need for efficient inference methods to exploit such rich resources. We introduce an algorithm to infer whole-genome history which has comparable accuracy to the state-of-the-art but can process around four orders of magnitude more sequences. Additionally, our method results in an “evolutionary encoding” of the original sequence data, enabling efficient access to genealogies and calculation of genetic statistics over the data. We apply this technique to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the genealogies we estimate are both rich in biological signal and efficient to process.
biorxiv evolutionary-biology 200-500-users 2018