A cancer pharmacogenomic screen powering crowd-sourced advancement of drug combination prediction, bioRxiv, 2017-10-10

AbstractThe effectiveness of most cancer targeted therapies is short lived since tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance, however the number of possible combinations is vast necessitating data-driven approaches to find optimal treatments tailored to a patient’s tumor. AstraZeneca carried out 11,576 experiments on 910 drug combinations across 85 cancer cell lines, recapitulating in vivo response profiles. These data, the largest openly available screen, were hosted by DREAM alongside deep molecular characterization from the Sanger Institute for a Challenge to computationally predict synergistic drug pairs and associated biomarkers. 160 teams participated to provide the most comprehensive methodological development and subsequent benchmarking to date. Winning methods incorporated prior knowledge of putative drug target interactions. For >60% of drug combinations synergy was reproducibly predicted with an accuracy matching biological replicate experiments, however 20% of drug combinations were poorly predicted by all methods. Genomic rationale for synergy predictions were identified, including antagonism unique to combined PIK3CBD inhibition with the ADAM17 inhibitor where synergy is seen with other PI3K pathway inhibitors. All data, methods and code are freely available as a resource to the community.

biorxiv bioinformatics 0-100-users 2017

Assessment of batch-correction methods for scRNA-seq data with a new test metric, bioRxiv, 2017-10-10

AbstractSingle-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations. As with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch effect correction is often evaluated by visual inspection of dimension-reduced representations such as principal component analysis. This is inherently imprecise due to the high number of genes and non-normal distribution of gene expression. Here, we present a k-nearest neighbour batch effect test (kBET, <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comtheislabkBET>httpsgithub.comtheislabkBET<jatsext-link>) to quantitatively measure batch effects. kBET is easier to interpret, more sensitive and more robust than visual evaluation and other measures of batch effects. We use kBET to assess commonly used batch regression and normalisation approaches, and quantify the extent to which they remove batch effects while preserving biological variability. Our results illustrate that batch correction based on log-transformation or scran pooling followed by ComBat reduced the batch effect while preserving structure across data sets. Finally we show that kBET can pinpoint successful data integration methods across multiple data sets, in this case from different publications all charting mouse embryonic development. This has important implications for future data integration efforts, which will be central to projects such as the Human Cell Atlas where data for the same tissue may be generated in multiple locations around the world.[Before final publication, we will upload the R package to Bioconductor]

biorxiv bioinformatics 0-100-users 2017

Fast and Accurate Genomic Analyses using Genome Graphs, bioRxiv, 2017-09-28

AbstractThe human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, which impairs read alignment and downstream analysis accuracy. Reference genome structures incorporating known genetic variation have been shown to improve the accuracy of genomic analyses, but have so far remained computationally prohibitive for routine large-scale use. Here we present a graph genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million indels. Our Graph Genome Pipeline requires 6.5 hours to process a 30x coverage WGS sample on a system with 36 CPU cores compared with 11 hours required by the GATK Best Practices pipeline. Using complementary benchmarking experiments based on real and simulated data, we show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, or about 20,000 additional variants being detected per sample, while variant calling specificity is unaffected. Structural variations (SVs) incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is a significant advance towards fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.

biorxiv bioinformatics 100-200-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo