Unlinked rRNA genes are widespread among Bacteria and Archaea, bioRxiv, 2019-07-17
AbstractRibosomes are essential to cellular life and the genes for their RNA components are the most conserved and transcribed genes in Bacteria and Archaea. These ribosomal rRNA genes are typically organized into a single operon, an arrangement that is thought to facilitate gene regulation. In reality, some Bacteria and Archaea do not share this canonical rRNA arrangement-their 16S and 23S rRNA genes are not co-located, but are instead separated across the genome and referred to as “unlinked”. This rearrangement has previously been treated as a rare exception or a byproduct of genome degradation in obligate intracellular bacteria. Here, we leverage complete genome and long-read metagenomic data to show that unlinked 16S and 23S rRNA genes are much more common than previously thought. Unlinked rRNA genes occur in many phyla, most significantly within Deinococcus-Thermus, Chloroflexi, Planctomycetes, and Euryarchaeota, and occur in differential frequencies across natural environments. We found that up to 41% of the taxa in soil, including dominant taxa, had unlinked rRNA genes, in contrast to the human gut, where all sequenced rRNA genes were linked. The frequency of unlinked rRNA genes may reflect meaningful life history traits, as they tend to be associated with a mix of slow-growing free-living species and obligatory intracellular species. Unlinked rRNA genes are also associated with changes in RNA metabolism, notably the loss of RNaseIII. We propose that unlinked rRNA genes may confer selective advantages in some environments, though the specific nature of these advantages remains undetermined and worthy of further investigation.
biorxiv microbiology 0-100-users 2019Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago, bioRxiv, 2019-07-16
AbstractIndonesia is the world’s fourth most populous country, host to striking levels of human diversity, regional patterns of admixture, and varying degrees of introgression from both Neanderthals and Denisovans. However, it has been largely excluded from the human genomics sequencing boom of the last decade. To serve as a benchmark dataset of molecular phenotypes across the region, we generated genome-wide CpG methylation and gene expression measurements in over 100 individuals from three locations that capture the major genomic and geographical axes of diversity across the Indonesian archipelago. Investigating between- and within-island differences, we find up to 10% of tested genes are differentially expressed between the islands of Mentawai (Sumatra) and New Guinea. Variation in gene expression is closely associated with DNA methylation, with expression levels of 9.7% of genes strongly correlating with nearby CpG methylation, and many of these genes being differentially expressed between islands. Genes identified in our differential expression and methylation analyses are enriched in pathways involved in immunity, highlighting Indonesia tropical role as a source of infectious disease diversity and the strong selective pressures these diseases have exerted on humans. Finally, we identify robust within-island variation in DNA methylation and gene expression, likely driven by very local environmental differences across sampling sites. Together, these results strongly suggest complex relationships between DNA methylation, transcription, archaic hominin introgression and immunity, all jointly shaped by the environment. This has implications for the application of genomic medicine, both in critically understudied Indonesia and globally, and will allow a better understanding of the interacting roles of genomic and environmental factors shaping molecular and complex phenotypes.
biorxiv genomics 0-100-users 2019Identification of hidden population structure in time-scaled phylogenies, bioRxiv, 2019-07-16
AbstractPopulation structure influences genealogical patterns, however data pertaining to how populations are structured are often unavailable or not directly observable. Inference of population structure is highly important in molecular epidemiology where pathogen phylogenetics is increasingly used to infer transmission patterns and detect outbreaks. Discrepancies between observed and idealised genealogies, such as those generated by the coalescent process, can be quantified, and where significant differences occur, may reveal the action of natural selection, host population structure, or other demographic and epidemiological heterogeneities. We have developed a fast non-parametric statistical test for detection of cryptic population structure in time-scaled phylogenetic trees. The test is based on contrasting estimated phylogenies with the theoretically expected phylodynamic ordering of common ancestors in two clades within a coalescent framework. These statistical tests have also motivated the development of algorithms which can be used to quickly screen a phylogenetic tree for clades which are likely to share a distinct demographic or epidemiological history. Epidemiological applications include identification of outbreaks in vulnerable host populations or rapid expansion of genotypes with a fitness advantage. To demonstrate the utility of these methods for outbreak detection, we applied the new methods to large phylogenies reconstructed from thousands of HIV-1 partial pol sequences. This revealed the presence of clades which had grown rapidly in the recent past, and was significantly concentrated in young men, suggesting recent and rapid transmission in that group. Furthermore, to demonstrate the utility of these methods for the study of antimicrobial resistance, we applied the new methods to a large phylogeny reconstructed from whole genome Neisseria gonorrhoeae sequences. We find that population structure detected using these methods closely overlaps with the appearance and expansion of mutations conferring antimicrobial resistance.
biorxiv evolutionary-biology 100-200-users 2019Striatal activity reflects cortical activity patterns, bioRxiv, 2019-07-16
The dorsal striatum is organized into domains that drive characteristic behaviors1–7, and receive inputs from different parts of the cortex8,9 which modulate similar behaviors10–12. Striatal responses to cortical inputs, however, can be affected by changes in connection strength13–15, local striatal circuitry16,17, and thalamic inputs18,19. Therefore, it is unclear whether the pattern of activity across striatal domains mirrors that across the cortex20–23 or differs from it24–28. Here we use simultaneous large-scale recordings in the cortex and the striatum to show that striatal activity can be accurately predicted by spatiotemporal activity patterns in the cortex. The relationship between activity in the cortex and the striatum was spatially consistent with corticostriatal anatomy, and temporally consistent with a feedforward drive. Each striatal domain exhibited specific sensorimotor responses that predictably followed activity in the associated cortical regions, and the corticostriatal relationship remained unvaried during passive states or performance of a task probing visually guided behavior. However, the task’s visual stimuli and corresponding behavioral responses evoked relatively more activity in the striatum than in associated cortical regions. This increased striatal activity involved an additive offset in firing rate, which was independent of task engagement but only present in animals that had learned the task. Thus, striatal activity largely reflects patterns of cortical activity, deviating from them in a simple additive fashion for learned stimuli or actions.
biorxiv neuroscience 100-200-users 2019Supercentenarians and the oldest-old are concentrated into regions with no birth certificates and short lifespans, bioRxiv, 2019-07-16
AbstractThe observation of individuals attaining remarkable ages, and their concentration into geographic sub-regions or ‘blue zones’, has generated considerable scientific interest. Proposed drivers of remarkable longevity include high vegetable intake, strong social connections, and genetic markers. Here, we reveal new predictors of remarkable longevity and ‘supercentenarian’ status. In the United States, supercentenarian status is predicted by the absence of vital registration. The state-specific introduction of birth certificates is associated with a 69-82% fall in the number of supercentenarian records. In Italy, which has more uniform vital registration, remarkable longevity is instead predicted by low per capita incomes and a short life expectancy. Finally, the designated ‘blue zones’ of Sardinia, Okinawa, and Ikaria corresponded to regions with low incomes, low literacy, high crime rate and short life expectancy relative to their national average. As such, relative poverty and short lifespan constitute unexpected predictors of centenarian and supercentenarian status, and support a primary role of fraud and error in generating remarkable human age records.
biorxiv developmental-biology 500+-users 2019Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics, bioRxiv, 2019-07-15
AbstractRecent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible (GTR) substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.
biorxiv bioinformatics 0-100-users 2019