Algorithmic Learning for Auto-deconvolution of GC-MS Data to Enable Molecular Networking within GNPS, bioRxiv, 2020-01-15
AbstractGas chromatography-mass spectrometry (GC-MS) represents an analytical technique with significant practical societal impact. Spectral deconvolution is an essential step for interpreting GC-MS data. No public GC-MS repositories that also enable repository-scale analysis exist, in part because deconvolution requires significant user input. We therefore engineered a scalable machine learning workflow for the Global Natural Product Social Molecular Networking (GNPS) analysis platform to enable the mass spectrometry community to store, process, share, annotate, compare, and perform molecular networking of GC-MS data. The workflow performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization, using a Fast Fourier Transform-based strategy to overcome scalability limitations. We introduce a “balance score” that quantifies the reproducibility of fragmentation patterns across all samples. We demonstrate the utility of the platform with breathomics analysis applied to the early detection of oesophago-gastric cancer, and by creating the first molecular spatial map of the human volatilome.
biorxiv bioinformatics 0-100-users 2020Drug mechanism-of-action discovery through the integration of pharmacological and CRISPR screens, bioRxiv, 2020-01-15
AbstractLow success rates during drug development are due in part to the difficulty of defining drug mechanism-of-action and molecular markers of therapeutic activity. Here, we integrated 199,219 drug sensitivity measurements for 397 unique anti-cancer drugs and genome-wide CRISPR loss-of-function screens in 484 cell lines to systematically investigate in cellular drug mechanism-of-action. We observed an enrichment for positive associations between drug sensitivity and knockout of their nominal targets, and by leveraging protein-protein networks we identified pathways that mediate drug response. This revealed an unappreciated role of mitochondrial E3 ubiquitin-protein ligase MARCH5 in sensitivity to MCL1 inhibitors. We also estimated drug on-target and off-target activity, informing on specificity, potency and toxicity. Linking drug and gene dependency together with genomic datasets uncovered contexts in which molecular networks when perturbed mediate cancer cell loss-of-fitness, and thereby provide independent and orthogonal evidence of biomarkers for drug development. This study illustrates how integrating cell line drug sensitivity with CRISPR loss-of-function screens can elucidate mechanism-of-action to advance drug development.
biorxiv systems-biology 0-100-users 2020Gapless assembly of maize chromosomes using long read technologies, bioRxiv, 2020-01-15
AbstractCreating gapless telomere-to-telomere assemblies of complex genomes is one of the ultimate challenges in genomics. We used long read technologies and an optical map based approach to produce a maize genome assembly composed of only 63 contigs. The B73-Ab10 genome includes gapless assemblies of chromosome 3 (236 Mb) and chromosome 9 (162 Mb), multiple highly repetitive centromeres and heterochromatic knobs, and 53 Mb of the Ab10 meiotic drive haplotype.
biorxiv bioinformatics 0-100-users 2020Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for twelve immune-mediated diseases, bioRxiv, 2020-01-15
AbstractThe identification of causal genetic variants for common diseases improves understanding of disease biology. Here we use data from the BLUEPRINT project to identify regulatory quantitative trait loci (QTL) for three primary human immune cell types and use these to fine-map putative causal variants for twelve immune-mediated diseases. We identify 340 unique, non major histocompatibility complex (MHC) disease loci that colocalise with high (>98%) posterior probability with regulatory QTLs, and apply Bayesian frameworks to fine-map associations at each locus. We show that fine-mapping applied to regulatory QTLs yields smaller credible set sizes and higher posterior probabilities for candidate causal variants compared to disease summary statistics. We also describe a systematic under-representation of insertiondeletion (INDEL) polymorphisms in credible sets derived from publicly available disease meta-analysis when compared to QTLs based on genome-sequencing data. Overall, our findings suggest that fine-mapping applied to disease-colocalising regulatory QTLs can enhance the discovery of putative causal disease variants and provide insights into the underlying causal genes and molecular mechanisms.
biorxiv genomics 0-100-users 2020Recent fluctuations in Mexican American genomes have altered the genetic architecture of biomedical traits, bioRxiv, 2020-01-15
AbstractHispanicsLatinos are a diverse group of admixed populations with African, European, and Native American ancestries. They remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genome-wide genotype data from the Hispanic Community Health StudyStudy of Latinos, we find that Native American ancestry has increased over time across HispanicLatino populations, particularly in Mexican Americans where Native American ancestry increased by an average of ∼20% over the 50-year period spanning 1940s-1990s. We find similar patterns across American cities, and replicate our observations in an independent sample of Mexican Americans. These dynamic ancestry patterns are a result of a complex interaction of several population and cultural factors, including strong ancestry-related assortative mating and subtle shifts in migration with differences in subcontinental Native American ancestry over time. These factors have shaped patterns of genetic variation, including an increase in runs of homozygosity in Native American ancestral tracts, and also influenced the genetic architecture of complex traits within the Mexican American population. We show for height, a trait correlated with ancestry, polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within HispanicsLatinos that may influence biomedical traits, demonstrating a crucial need to improve our understanding of the genetic diversity of admixed populations.
biorxiv genetics 0-100-users 2020Sampling artifacts in single-cell genomics cohort studies, bioRxiv, 2020-01-15
AbstractRobust protocols and automation now enable large-scale single-cell RNA and ATAC sequencing experiments and their application on biobank and clinical cohorts. However, technical biases introduced during sample acquisition can hinder solid, reproducible results and a systematic benchmarking is required before entering large-scale data production. Here, we report the existence and extent of gene expression and chromatin accessibility artifacts introduced during sampling and identify experimental and computational solutions for their prevention.
biorxiv genomics 100-200-users 2020