AnnoTree visualization and exploration of a functionally annotated microbial tree of life, bioRxiv, 2018-11-06
AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 geneprotein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infectiondefense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.
biorxiv bioinformatics 100-200-users 2018Comparative analysis of sequencing technologies platforms for single-cell transcriptomics, bioRxiv, 2018-11-06
AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.
biorxiv genomics 0-100-users 2018Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity, bioRxiv, 2018-11-06
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comspacegraphcats>httpsgithub.comspacegraphcats<jatsext-link> spacegraphcats under the 3-Clause BSD License.
biorxiv bioinformatics 100-200-users 2018Investigating causal relationships between sleep traits and risk of breast cancer a Mendelian randomization study, bioRxiv, 2018-11-06
AbstractObjectiveTo examine whether sleep traits have a causal effect on risk of breast cancer.DesignMultivariable regression, one- and two-sample Mendelian randomization.SettingThe UK Biobank prospective cohort study and the Breast Cancer Association Consortium (BCAC) case-control genome-wide association study.Participants156,848 women in the multivariable regression and one-sample Mendelian randomization analysis in UK Biobank (7,784 with a breast cancer diagnosis) and 122,977 breast cancer cases and 105,974 controls from BCAC in the two-sample Mendelian randomization analysis.ExposuresSelf-reported chronotype (morningevening preference), insomnia symptoms and sleep duration in multivariable regression, and genetic variants robustly associated with these sleep traits.Main outcome measuresBreast cancer (prevalent and incident cases in UK Biobank, prevalent cases only in BCAC).ResultsIn multivariable regression analysis using data on breast cancer incidence in UK Biobank, morning preference was inversely associated with breast cancer (HR 0.95, 95% CI 0.93, 0.98 per category increase) while there was little evidence for an association with sleep duration and insomnia symptoms. Using 341 single nucleotide polymorphisms (SNPs) associated with chronotype, 91 SNPs associated sleep duration and 57 SNPs associated with insomnia symptoms, one-sample MR analysis in UK Biobank provided some supportive evidence for a protective effect of morning preference on breast cancer risk (HR 0.85, 95% 0.70, 1.03 per category increase) but imprecise estimates for sleep duration and insomnia symptoms. Two-sample MR using data from BCAC supported findings for a protective effect of morning preference (OR 0.88, 95% CI 0.82, 0.93 per category increase) and adverse effect of increased sleep duration (OR 1.19, 95% CI 1.02, 1.39 per hour increase) on breast cancer (both estrogen receptor positive and negative), while there was inconsistent evidence for insomnia symptoms. Results were largely robust to sensitivity analyses accounting for horizontal pleiotropy.ConclusionsWe found consistent evidence for a protective effect of morning preference and suggestive evidence for an adverse effect of sleep duration on breast cancer risk.
biorxiv epidemiology 0-100-users 2018Structure of a bacterial ATP synthase, bioRxiv, 2018-11-06
ATP synthases produce ATP from ADP and inorganic phosphate with energy from a transmembrane proton motive force. Bacterial ATP synthases have been studied extensively because they are the simplest form of the enzyme and because of the relative ease of genetic manipulation of these complexes. We expressed the Bacillus PS3 ATP synthase in Eschericia coli, purified it, and imaged it by cryo-EM, allowing us to build atomic models of the complex in three rotational states. The position of subunit e shows how it is able to inhibit ATP hydrolysis while allowing ATP synthesis. The architecture of the membrane region shows how the simple bacterial ATP synthase is able to perform the same core functions as the equivalent, but more complicated, mitochondrial complex. The structures reveal the path of transmembrane proton translocation and provide a model for understanding decades of biochemical analysis interrogating the roles of specific residues in the enzyme.
biorxiv biochemistry 500+-users 2018Fast, sensitive, and accurate integration of single cell data with Harmony, bioRxiv, 2018-11-05
AbstractThe rapidly emerging diversity of single cell RNAseq datasets allows us to characterize the transcriptional behavior of cell types across a wide variety of biological and clinical conditions. With this comprehensive breadth comes a major analytical challenge. The same cell type across tissues, from different donors, or in different disease states, may appear to express different genes. A joint analysis of multiple datasets requires the integration of cells across diverse conditions. This is particularly challenging when datasets are assayed with different technologies in which real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms. Moreover, we show that Harmony requires dramatically fewer computational resources. It is the only available algorithm that makes the integration of ∼ 106 cells feasible on a personal computer. We demonstrate that Harmony identifies both broad populations and fine-grained subpopulations of PBMCs from datasets with large experimental differences. In a meta-analysis of 14,746 cells from 5 studies of human pancreatic islet cells, Harmony accounts for variation among technologies and donors to successfully align several rare subpopulations. In the resulting integrated embedding, we identify a previously unidentified population of potentially dysfunctional alpha islet cells, enriched for genes active in the Endoplasmic Reticulum (ER) stress response. The abundance of these alpha cells correlates across donors with the proportion of dysfunctional beta cells also enriched in ER stress response genes. Harmony is a fast and flexible general purpose integration algorithm that enables the identification of shared fine-grained subpopulations across a variety of experimental and biological conditions.
biorxiv bioinformatics 100-200-users 2018