Nanopore direct RNA sequencing maps an Arabidopsis N6 methyladenosine epitranscriptome, bioRxiv, 2019-07-17
AbstractUnderstanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.
biorxiv molecular-biology 0-100-users 2019Unlinked rRNA genes are widespread among Bacteria and Archaea, bioRxiv, 2019-07-17
AbstractRibosomes are essential to cellular life and the genes for their RNA components are the most conserved and transcribed genes in Bacteria and Archaea. These ribosomal rRNA genes are typically organized into a single operon, an arrangement that is thought to facilitate gene regulation. In reality, some Bacteria and Archaea do not share this canonical rRNA arrangement-their 16S and 23S rRNA genes are not co-located, but are instead separated across the genome and referred to as “unlinked”. This rearrangement has previously been treated as a rare exception or a byproduct of genome degradation in obligate intracellular bacteria. Here, we leverage complete genome and long-read metagenomic data to show that unlinked 16S and 23S rRNA genes are much more common than previously thought. Unlinked rRNA genes occur in many phyla, most significantly within Deinococcus-Thermus, Chloroflexi, Planctomycetes, and Euryarchaeota, and occur in differential frequencies across natural environments. We found that up to 41% of the taxa in soil, including dominant taxa, had unlinked rRNA genes, in contrast to the human gut, where all sequenced rRNA genes were linked. The frequency of unlinked rRNA genes may reflect meaningful life history traits, as they tend to be associated with a mix of slow-growing free-living species and obligatory intracellular species. Unlinked rRNA genes are also associated with changes in RNA metabolism, notably the loss of RNaseIII. We propose that unlinked rRNA genes may confer selective advantages in some environments, though the specific nature of these advantages remains undetermined and worthy of further investigation.
biorxiv microbiology 0-100-users 2019Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago, bioRxiv, 2019-07-16
AbstractIndonesia is the world’s fourth most populous country, host to striking levels of human diversity, regional patterns of admixture, and varying degrees of introgression from both Neanderthals and Denisovans. However, it has been largely excluded from the human genomics sequencing boom of the last decade. To serve as a benchmark dataset of molecular phenotypes across the region, we generated genome-wide CpG methylation and gene expression measurements in over 100 individuals from three locations that capture the major genomic and geographical axes of diversity across the Indonesian archipelago. Investigating between- and within-island differences, we find up to 10% of tested genes are differentially expressed between the islands of Mentawai (Sumatra) and New Guinea. Variation in gene expression is closely associated with DNA methylation, with expression levels of 9.7% of genes strongly correlating with nearby CpG methylation, and many of these genes being differentially expressed between islands. Genes identified in our differential expression and methylation analyses are enriched in pathways involved in immunity, highlighting Indonesia tropical role as a source of infectious disease diversity and the strong selective pressures these diseases have exerted on humans. Finally, we identify robust within-island variation in DNA methylation and gene expression, likely driven by very local environmental differences across sampling sites. Together, these results strongly suggest complex relationships between DNA methylation, transcription, archaic hominin introgression and immunity, all jointly shaped by the environment. This has implications for the application of genomic medicine, both in critically understudied Indonesia and globally, and will allow a better understanding of the interacting roles of genomic and environmental factors shaping molecular and complex phenotypes.
biorxiv genomics 0-100-users 2019Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics, bioRxiv, 2019-07-15
AbstractRecent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible (GTR) substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.
biorxiv bioinformatics 0-100-users 2019The ARRIVE guidelines 2019 updated guidelines for reporting animal research, bioRxiv, 2019-07-15
AbstractReproducible science requires transparent reporting. The ARRIVE guidelines were originally developed in 2010 to improve the reporting of animal research. They consist of a checklist of information to include in publications describing in vivo experiments to enable others to scrutinise the work adequately, evaluate its methodological rigour, and reproduce the methods and results. Despite considerable levels of endorsement by funders and journals over the years, adherence to the guidelines has been inconsistent, and the anticipated improvements in the quality of reporting in animal research publications have not been achieved.Here we introduce ARRIVE 2019. The guidelines have been updated and information reorganised to facilitate their use in practice. We used a Delphi exercise to prioritise the items and split the guidelines into two sets, the ARRIVE Essential 10, which constitute the minimum requirement, and the Recommended Set, which describes the research context. This division facilitates improved reporting of animal research by supporting a stepwise approach to implementation. This helps journal editors and reviewers to verify that the most important items are being reported in manuscripts. We have also developed the accompanying Explanation and Elaboration document that serves 1) to explain the rationale behind each item in the guidelines, 2) to clarify key concepts and 3) to provide illustrative examples. We aim through these changes to help ensure that researchers, reviewers and journal editors are better equipped to improve the rigour and transparency of the scientific process and thus reproducibility.
biorxiv scientific-communication-and-education 0-100-users 2019souporcell Robust clustering of single cell RNAseq by genotype and ambient RNA inference without reference genotypes, bioRxiv, 2019-07-14
Methods to deconvolve single-cell RNA sequencing (scRNAseq) data are necessary for samples containing a natural mixture of genotypes and for scRNAseq experiments that multiplex cells from different donors1. Multiplexing across donors is a popular experimental design with many benefits including avoiding batch effects2, reducing costs, and improving doublet detection. Using variants detected in the RNAseq reads, it is possible to assign cells to the individuals from which they arose. These variants can also be used to identify and remove cross-genotype doublet cells that may have highly similar transcriptional profiles precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA in the system. Ambient RNA is caused by cell lysis prior to droplet partitioning and is an important confounder of scRNAseq analysis3. Souporcell is a novel method to cluster cells using only the genetic variants detected within the scRNAseq reads. We show that it achieves high accuracy on genotype clustering, doublet detection, and ambient RNA estimation as demonstrated across a wide range of challenging scenarios.
biorxiv bioinformatics 0-100-users 2019