Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program, bioRxiv, 2019-03-07

Summary paragraphThe Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertiondeletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.

biorxiv genomics 100-200-users 2019

Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease, bioRxiv, 2019-03-06

AbstractBackgroundWhole-genome “shotgun” (WGS) metagenomic sequencing is an increasingly widely used tool for analyzing the metagenomic content of microbiome samples. While WGS data contains gene-level information, it can be challenging to analyze the millions of microbial genes which are typically found in microbiome experiments. To mitigate the ultrahigh dimensionality challenge of gene-level metagenomics, it has been proposed to cluster genes by co-abundance to form Co-Abundant Gene groups (CAGs). However, exhaustive co-abundance clustering of millions of microbial genes across thousands of biological samples has previously been intractable purely due to the computational challenge of performing trillions of pairwise comparisons.ResultsHere we present a novel computational approach to the analysis of WGS datasets in which microbial gene groups are the fundamental unit of analysis. We use the Approximate Nearest Neighbor heuristic for near-exhaustive average linkage clustering to group millions of genes by co-abundance. This results in thousands of high-quality CAGs representing complete and partial microbial genomes. We applied this method to publicly available WGS microbiome surveys and found that the resulting microbial CAGs associated with inflammatory bowel disease (IBD) and colorectal cancer (CRC) were highly reproducible and could be validated independently using multiple independent cohorts.ConclusionsThis powerful approach to gene-level metagenomics provides a powerful path forward for identifying the biological links between the microbiome and human health. By proposing a new computational approach for handling high dimensional metagenomics data, we identified specific microbial gene groups that are associated with disease that can be used to identify strains of interest for further preclinical and mechanistic experimentation.

biorxiv bioinformatics 100-200-users 2019

The Trichoplax microbiome the simplest animal lives in an intimate symbiosis with two intracellular bacteria, bioRxiv, 2019-03-06

Summary paragraphPlacozoa is an enigmatic phylum of simple, microscopic, marine metazoans. Although intracellular bacteria have been found in all members of this phylum, almost nothing is known about their identity, location and interactions with their host. We used metagenomic and metatranscriptomic sequencing of single host individuals, plus metaproteomic and imaging analyses, to show that the placozoan Trichoplax H2 lives in symbiosis with two intracellular bacteria. One symbiont forms a new genus in the Midichloriaceae (Rickettsiales) and has a genomic repertoire similar to that of rickettsial parasites, but does not appear to express key genes for energy parasitism. Correlative microscopy and 3-D electron tomography revealed that this symbiont resides in an unusual location, the rough endoplasmic reticulum of its host’s internal fiber cells. The second symbiont belongs to the Margulisbacteria, a phylum without cultured representatives and not known to form intracellular associations. This symbiont lives in the ventral epithelial cells of Trichoplax, likely metabolizes algal lipids digested by its host, and has the capacity to supplement the placozoan’s nutrition. Our study shows that even the simplest animals known have evolved highly specific and intimate associations with symbiotic, intracellular bacteria, and highlights that symbioses with microorganisms are a basal trait of animal life.

biorxiv microbiology 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo