AnnoTree visualization and exploration of a functionally annotated microbial tree of life, bioRxiv, 2018-11-06

AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 geneprotein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infectiondefense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.

biorxiv bioinformatics 100-200-users 2018

Investigating causal relationships between sleep traits and risk of breast cancer a Mendelian randomization study, bioRxiv, 2018-11-06

AbstractObjectiveTo examine whether sleep traits have a causal effect on risk of breast cancer.DesignMultivariable regression, one- and two-sample Mendelian randomization.SettingThe UK Biobank prospective cohort study and the Breast Cancer Association Consortium (BCAC) case-control genome-wide association study.Participants156,848 women in the multivariable regression and one-sample Mendelian randomization analysis in UK Biobank (7,784 with a breast cancer diagnosis) and 122,977 breast cancer cases and 105,974 controls from BCAC in the two-sample Mendelian randomization analysis.ExposuresSelf-reported chronotype (morningevening preference), insomnia symptoms and sleep duration in multivariable regression, and genetic variants robustly associated with these sleep traits.Main outcome measuresBreast cancer (prevalent and incident cases in UK Biobank, prevalent cases only in BCAC).ResultsIn multivariable regression analysis using data on breast cancer incidence in UK Biobank, morning preference was inversely associated with breast cancer (HR 0.95, 95% CI 0.93, 0.98 per category increase) while there was little evidence for an association with sleep duration and insomnia symptoms. Using 341 single nucleotide polymorphisms (SNPs) associated with chronotype, 91 SNPs associated sleep duration and 57 SNPs associated with insomnia symptoms, one-sample MR analysis in UK Biobank provided some supportive evidence for a protective effect of morning preference on breast cancer risk (HR 0.85, 95% 0.70, 1.03 per category increase) but imprecise estimates for sleep duration and insomnia symptoms. Two-sample MR using data from BCAC supported findings for a protective effect of morning preference (OR 0.88, 95% CI 0.82, 0.93 per category increase) and adverse effect of increased sleep duration (OR 1.19, 95% CI 1.02, 1.39 per hour increase) on breast cancer (both estrogen receptor positive and negative), while there was inconsistent evidence for insomnia symptoms. Results were largely robust to sensitivity analyses accounting for horizontal pleiotropy.ConclusionsWe found consistent evidence for a protective effect of morning preference and suggestive evidence for an adverse effect of sleep duration on breast cancer risk.

biorxiv epidemiology 0-100-users 2018

Fast, sensitive, and accurate integration of single cell data with Harmony, bioRxiv, 2018-11-05

AbstractThe rapidly emerging diversity of single cell RNAseq datasets allows us to characterize the transcriptional behavior of cell types across a wide variety of biological and clinical conditions. With this comprehensive breadth comes a major analytical challenge. The same cell type across tissues, from different donors, or in different disease states, may appear to express different genes. A joint analysis of multiple datasets requires the integration of cells across diverse conditions. This is particularly challenging when datasets are assayed with different technologies in which real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms. Moreover, we show that Harmony requires dramatically fewer computational resources. It is the only available algorithm that makes the integration of ∼ 106 cells feasible on a personal computer. We demonstrate that Harmony identifies both broad populations and fine-grained subpopulations of PBMCs from datasets with large experimental differences. In a meta-analysis of 14,746 cells from 5 studies of human pancreatic islet cells, Harmony accounts for variation among technologies and donors to successfully align several rare subpopulations. In the resulting integrated embedding, we identify a previously unidentified population of potentially dysfunctional alpha islet cells, enriched for genes active in the Endoplasmic Reticulum (ER) stress response. The abundance of these alpha cells correlates across donors with the proportion of dysfunctional beta cells also enriched in ER stress response genes. Harmony is a fast and flexible general purpose integration algorithm that enables the identification of shared fine-grained subpopulations across a variety of experimental and biological conditions.

biorxiv bioinformatics 100-200-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo