Interplay between the human gut microbiome and host metabolism, bioRxiv, 2019-02-28
The human gut is inhabited by a complex and metabolically active microbial ecosystem regulating host health. While many studies have focused on the effect of individual microbial taxa, the metabolic potential of the entire gut microbial ecosystem has been largely under-explored. We characterised the gut microbiome of 1,004 twins via whole shotgun metagenomic sequencing (average 39M reads per sample). We observed greater similarity, across unrelated individuals, for functional metabolic pathways (82%) than for taxonomic composition (43%). We conducted a microbiota-wide association study linking both taxonomic information and microbial metabolic pathways with 673 blood and 713 faecal metabolites (Metabolon, Inc.). Metabolic pathways associated with 34% of blood and 95% of faecal metabolites, with over 18,000 significant associations, while species-level results identified less than 3,000 associations, suggesting that coordinated action of multiple taxa is required to affect the metabolome. Finally, we estimated that the microbiome mediated a crosstalk between 71% of faecal and 15% of blood metabolites, highlighting six key species (unclassified Subdoligranulum spp., Faecalibacterium prausnitzii, Roseburia inulinivorans, Methanobrevibacter smithii, Eubacterium rectale, and Akkermansia muciniphila). Because of the large inter-person variability in microbiome composition, our results underline the importance of studying gut microbial metabolic pathways rather than focusing purely on taxonomy to find therapeutic and diagnostic targets.
biorxiv microbiology 0-100-users 2019Interdependent Phenotypic and Biogeographic Evolution Driven by Biotic Interactions, bioRxiv, 2019-02-26
Biotic interactions are hypothesized to be one of the main processes shaping trait and biogeographic evolution during lineage diversification. Theoretical and empirical evidence suggests that species with similar ecological requirements either spatially exclude each other, by preventing the colonization of competitors or by driving coexisting populations to extinction, or show niche divergence when in sympatry. However, the extent and generality of the effect of interspecific competition in trait and biogeographic evolution has been limited by a dearth of appropriate process-generating models to directly test the effect of biotic interactions. Here, we formulate a phylogenetic parametric model that allows interdependence between trait and biogeographic evolution, thus enabling a direct test of central hypotheses on how biotic interactions shape these evolutionary processes. We adopt a Bayesian data augmentation approach to estimate the joint posterior distribution of trait histories, range histories, and co-evolutionary process parameters under this analytically intractable model. Through simulations, we show that our model is capable of distinguishing alternative scenarios of biotic interactions. We apply our model to the radiation of Darwin's finches---a classic example of adaptive divergence---and find support for in situ trait divergence in beak size, convergence in traits such as beak shape and tarsus length, and strong competitive exclusion throughout their evolutionary history. Our modeling framework opens new possibilities for testing more complex hypotheses about the processes underlying lineage diversification. More generally, it provides a robust probabilistic methodology to model correlated evolution of continuous and discrete characters.
biorxiv evolutionary-biology 0-100-users 2019Samovar Single-sample mosaic SNV calling with linked reads, bioRxiv, 2019-02-26
We present Samovar, a mosaic single-nucleotide variant (SNV) caller for linked-read whole-genome shotgun sequencing data. Samovar scores candidate sites using a random forest model trained using the input dataset that considers read quality, phasing, and linked-read characteristics. We show Samovar calls mosaic SNVs within a single sample with accuracy comparable to what previously required trios or matched tumornormal pairs and outperform single-sample mosaic variant callers at MAF 5%-50% with at least 30x coverage. Furthermore, we use Samovar to find somatic variants in whole genome sequencing of both tumor and normal from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at httpsgithub.comcdarbysamovar under the MIT license.
biorxiv genomics 0-100-users 2019Task-evoked activity quenches neural correlations and variability in large-scale brain systems, bioRxiv, 2019-02-26
Many studies of large-scale neural systems have emphasized the importance of communication through increased inter-region correlations (functional connectivity) during task states relative to resting state. In contrast, local circuit studies have demonstrated that task states reduce correlations among local neural populations, likely enhancing their information coding. Here we sought to adjudicate between these conflicting perspectives, assessing whether large-scale system correlations tend to increase or decrease during task states. To establish a mechanistic framework for interpreting changes in neural correlations, we conceptualized neural populations as having a sigmoidal neural transfer function. In a computational model we found that this straightforward assumption predicts reductions in neural populations' dynamic output range as task-evoked activity levels increase, reducing responsiveness to inputs from other regions (i.e., reduced correlations). We demonstrated this empirically in large-scale neural populations across two highly distinct data sets human functional magnetic resonance imaging data and non-human primate spiking data. We found that task states increased global neural activity, while globally quenching neural variability and correlations. Further, this global reduction of neural correlations led to an overall increase in dimensionality (reflecting less information redundancy) during task states, providing an information-theoretic explanation for task-induced correlation reductions. Together, our results provide an integrative mechanistic account that encompasses measures of large-scale neural activity, variability, and correlations during resting and task states.
biorxiv neuroscience 0-100-users 2019Cooler scalable storage for Hi-C data and other genomically-labeled arrays, bioRxiv, 2019-02-23
Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is cross-platform, BSD-licensed, and can be installed from the Python Package Index or the bioconda repository. The source code is maintained on Github at httpsgithub.commirnylabcooler.
biorxiv bioinformatics 0-100-users 2019Genomic analysis reveals a functional role for myocardial trabeculae in adults, bioRxiv, 2019-02-23
Since being first described by Leonardo da Vinci in 1513 it has remained an enigma why the endocardial surfaces of the adult heart retain a complex network of muscular trabeculae - with their persistence thought to be a vestige of embryonic development. For causative physiological inference we harness population genomics, image-based intermediate phenotyping and in silico modelling to determine the effect of this complex cardiovascular trait on function. Using deep learning-based image analysis we identified genetic associations with trabecular complexity in 18,097 UK Biobank participants which were replicated in an independently measured cohort of 1,129 healthy adults. Genes in these associated regions are enriched for expression in the fetal heart or vasculature and implicate loci associated with haemodynamic phenotypes and developmental pathways. A causal relationship between increasing trabecular complexity and both ventricular performance and electrical activity are supported by complementary biomechanical simulations and Mendelian randomisation studies. These findings show that myocardial trabeculae are a previously-unrecognised determinant of cardiovascular physiology in adult humans.
biorxiv genomics 0-100-users 2019