A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome, bioRxiv, 2019-09-20
AbstractComprehensive reference data is essential for accurate taxonomic and functional characterization of the human gut microbiome. Here we present the Unified Human Gastrointestinal Genome (UHGG) collection, a resource combining 286,997 genomes representing 4,644 prokaryotic species from the human gut. These genomes contain over 625 million protein sequences used to generate the Unified Human Gastrointestinal Protein (UHGP) catalogue, a collection that more than doubles the number of gut protein clusters over the Integrated Gene Catalogue. We find that a large portion of the human gut microbiome remains to be fully explored, with over 70% of the UHGG species lacking cultured representatives, and 40% of the UHGP missing meaningful functional annotations. Intra-species genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which were specific to individual human populations. These freely available genomic resources should greatly facilitate investigations into the human gut microbiome.
biorxiv microbiology 200-500-users 2019Imaging plant germline differentiation within Arabidopsis flower by light sheet microscopy, bioRxiv, 2019-09-20
AbstractIn higher plants, germline differentiation occurs during a relatively short period within developing flowers. Understanding of the mechanisms that govern germline differentiation lags behind other plant developmental processes. This is largely because the germline is restricted to relatively few cells buried deep within floral tissues, which makes them difficult to study. To overcome this limitation, we have developed a methodology for live imaging of the germ cell lineage within floral organs of Arabidopsis using light sheet fluorescence microscopy. We have established reporter lines, cultivation conditions, and imaging protocols for high-resolution microscopy of developing flowers continuously for up to several days. We used multiview imagining to reconstruct a three-dimensional model of a flower at subcellular resolution. We demonstrate the power of this approach by capturing male and female meiosis, asymmetric pollen division, movement of meiotic chromosomes, and unusual restitution mitosis in tapetum cells. This method will enable new avenues of research into plant sexual reproduction.
biorxiv plant-biology 200-500-users 2019Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy, bioRxiv, 2019-09-19
AbstractWe recently introduced the Genome Taxonomy Database (GTDB), a phylogenetically consistent, genome-based taxonomy providing rank normalized classifications for nearly 150,000 genomes from domain to genus. However, nearly 40% of the genomes used to infer the GTDB reference tree lack a species name, reflecting the large number of genomes in public repositories without complete taxonomic assignments. Here we address this limitation by proposing 24,706 species clusters which encompass all publicly available bacterial and archaeal genomes when using commonly accepted average nucleotide identity (ANI) criteria for circumscribing species. In contrast to previous ANI studies, we selected a single representative genome to serve as the nomenclatural type for circumscribing each species with type strains used where available. We complemented the 8,792 species clusters with validly or effectively published names with 15,914de novospecies clusters in order to assign placeholder names to the growing number of genomes from uncultivated species. This provides the first complete domain to species taxonomic framework which will improve communication of scientific results.
biorxiv microbiology 200-500-users 2019Ancient DNA reconstructs the genetic legacies of pre-contact Puerto Rico communities, bioRxiv, 2019-09-12
AbstractIndigenous peoples have occupied the island of Puerto Rico since at least 3000 B.C. Due to the demographic shifts that occurred after European contact, the origin(s) of these ancient populations, and their genetic relationship to present-day islanders, are unclear. We use ancient DNA to characterize the population history and genetic legacies of pre-contact Indigenous communities from Puerto Rico. Bone, tooth and dental calculus samples were collected from 124 individuals from three pre-contact archaeological sites Tibes, Punta Candelero and Paso del Indio. Despite poor DNA preservation, we used target enrichment and high-throughput sequencing to obtain complete mitochondrial genomes (mtDNA) from 45 individuals and autosomal genotypes from two individuals. We found a high proportion of Native American mtDNA haplogroups A2 and C1 in the pre-contact Puerto Rico sample (40% and 44%, respectively). This distribution, as well as the haplotypes represented, support a primarily Amazonian South American origin for these populations, and mirrors the Native American mtDNA diversity patterns found in present-day islanders. Three mtDNA haplotypes from pre-contact Puerto Rico persist among Puerto Ricans and other Caribbean islanders, indicating that present-day populations are reservoirs of pre-contact mtDNA diversity. Lastly, we find similarity in autosomal ancestry patterns between pre-contact individuals from Puerto Rico and the Bahamas, suggesting a shared component of Indigenous Caribbean ancestry with close affinity to South American populations. Our findings contribute to a more complete reconstruction of pre-contact Caribbean population history and explore the role of Indigenous peoples in shaping the biocultural diversity of present-day Puerto Ricans and other Caribbean islanders.
biorxiv genomics 200-500-users 2019How neurons move during action potentials, bioRxiv, 2019-09-11
AbstractNeurons undergo nanometer-scale deformations during action potentials, and the underlying mechanism has been actively debated for decades. Previous observations were limited to a single spot or the cell boundary, while movement across the entire neuron during the action potential remained unclear.We report full-field imaging of cellular deformations accompanying the action potential in mammalian neuron somas (−1.8nm~1.3nm) and neurites (−0.7nm~0.9nm), using fast quantitative phase imaging with a temporal resolution of 0.1ms and an optical pathlength sensitivity of <4pm per pixel. Spike-triggered average, synchronized to electrical recording, demonstrates that the time course of the optical phase changes matches the dynamics of the electrical signal, with the optical signal revealing the intracellular potential rather than its time derivative detected via extracellular electrodes. Using 3D cellular morphology extracted via confocal microscopy, we demonstrate that the voltage-dependent changes in the membrane tension induced by ionic repulsion can explain the magnitude, time course and spatial features of the phase imaging. Our full-field observations of the spike-induced deformations in mammalian neurons opens the door to non-invasive label-free imaging of neural signaling.
biorxiv neuroscience 200-500-users 2019Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq, bioRxiv, 2019-09-10
The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.compachterlabSBP_2019>httpsgithub.compachterlabSBP_2019<jatsext-link>.
biorxiv genomics 200-500-users 2019