The druggable genome and support for target identification and validation in drug development, bioRxiv, 2016-07-27
Target identification (identifying the correct drug targets for each disease) and target validation (demonstrating the effect of target perturbation on disease biomarkers and disease end-points) are essential steps in drug development. We showed previously that biomarker and disease endpoint associations of single nucleotide polymorphisms (SNPs) in a gene encoding a drug target accurately depict the effect of modifying the same target with a pharmacological agent; others have shown that genomic support for a target is associated with a higher rate of drug development success. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome wide association studies (GWAS) to an updated set of genes encoding druggable human proteins, to compounds with bioactivity against these targets and, where these were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, to enable druggable genome-wide association studies for drug target selection and validation in human disease.
biorxiv genetics 0-100-users 2016Massively multiplex single-cell Hi-C, bioRxiv, 2016-07-24
AbstractWe present combinatorial single cell Hi-C, a novel method that leverages combinatorial cellular indexing to measure chromosome conformation in large numbers of single cells. In this proof-of-concept, we generate and sequence combinatorial single cell Hi-C libraries for two mouse and four human cell types, comprising a total of 9,316 single cells across 5 experiments. We demonstrate the utility of single-cell Hi-C data in separating different cell types, identify previously uncharacterized cell-to-cell heterogeneity in the conformational properties of mammalian chromosomes, and demonstrate that combinatorial indexing is a generalizable molecular strategy for single-cell genomics.
biorxiv genomics 0-100-users 2016CRISPR-Cas9 mediated mutagenesis of a DMR6 ortholog in tomato confers broad-spectrum disease resistance, bioRxiv, 2016-07-21
AbstractPathogenic microbes are responsible for severe production losses in crops worldwide. The use of disease resistant crop varieties can be a sustainable approach to meet the food demand of the world’s growing population. However, classical plant breeding is usually laborious and time-consuming, thus hampering efficient improvement of many crops. With the advent of genome editing technologies, in particular the CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-Cas9) system, we are now able to introduce improved crop traits in a rapid and efficient manner. In this work, we genome edited durable disease resistance in tomato by modifying a specific gene associated with disease resistance. Recently, it was demonstrated that inactivation of a single gene called DMR6 (downy mildew resistance 6) confers resistance to several pathogens in Arabidopsis thaliana. This gene is specifically up-regulated during pathogen infection, and mutations in the dmr6 gene results in increased salicylic acid levels. The tomato SlDMR6-1 orthologue Solyc03g080190 is also up-regulated during infection by Pseudomonas syringae pv. tomato and Phytophthora capsici. Using the CRISPR-Cas9 system, we generated tomato plants with small deletions in the SlDMR6-1 gene that result in frameshift and premature truncation of the protein. Remarkably, these mutants do not have significant detrimental effects in terms of growth and development under greenhouse conditions and show disease resistance against different pathogens, including P. syringae, P. capsici and Xanthomonas spp.
biorxiv plant-biology 0-100-users 2016H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer, bioRxiv, 2016-07-18
A quantitative model to genetically interpret the histology in whole microscopy slide images is desirable to guide downstream immuno-histochemistry, genomics, and precision medicine. We constructed a statistical model that predicts whether or not SPOP is mutated in prostate cancer, given only the digital whole slide after standard hematoxylin and eosin [H&E] staining. Using a TCGA cohort of 177 prostate cancer patients where 20 had mutant SPOP, we trained multiple ensembles of residual networks, accurately distinguishing SPOP mutant from SPOP non-mutant patients (test AUROC=0.74, p=0.0007 Fisher’s Exact Test). We further validated our full metaensemble classifier on an independent test cohort from MSK-IMPACT of 152 patients where 19 had mutant SPOP. Mutants and non-mutants were accurately distinguished despite TCGA slides being frozen sections and MSK-IMPACT slides being formalin-fixed paraffin-embedded sections (AUROC=0.86, p=0.0038). Moreover, we scanned an additional 36 MSK-IMPACT patients having mutant SPOP, trained on this expanded MSK-IMPACT cohort (test AUROC=0.75, p=0.0002), tested on the TCGA cohort (AUROC=0.64, p=0.0306), and again accurately distinguished mutants from non-mutants using the same pipeline. Importantly, our method demonstrates tractable deep learning in this “small data” setting of 20-55 positive examples and quantifies each prediction’s uncertainty with confidence intervals. To our knowledge, this is the first statistical model to predict a genetic mutation in cancer directly from the patient’s digitized H&E-stained whole microscopy slide. Moreover, this is the first time quantitative features learned from patient genetics and histology have been used for content-based image retrieval, finding similar patients for a given patient where the histology appears to share the same genetic driver of disease i.e. SPOP mutation (p=0.0241 Kost’s Method), and finding similar patients for a given patient that does not have have that driver mutation (p=0.0170 Kost’s Method).Significance StatementThis is the first pipeline predicting gene mutation probability in cancer from digitized H&E-stained microscopy slides. To predict whether or not the speckle-type POZ protein [SPOP] gene is mutated in prostate cancer, the pipeline (i) identifies diagnostically salient slide regions, (ii) identifies the salient region having the dominant tumor, and (iii) trains ensembles of binary classifiers that together predict a confidence interval of mutation probability. Through deep learning on small datasets, this enables automated histologic diagnoses based on probabilities of underlying molecular aberrations and finds histologically similar patients by learned genetic-histologic relationships.Conception, Writing AJS, TJF. Algorithms, Learning, CBIR AJS. Analysis AJS, MAR, TJF. Supervision MAR, TJF.
biorxiv pathology 0-100-users 2016Rapid and efficient analysis of 20,000 RNA-seq samples with Toil, bioRxiv, 2016-07-08
ABSTRACTToil is portable, open-source workflow software that supports contemporary workflow definition languages and can be used to securely and reproducibly run scientific workflows efficiently at large-scale. To demonstrate Toil, we processed over 20,000 RNA-seq samples to create a consistent meta-analysis of five datasets free of computational batch effects that we make freely available. Nearly all the samples were analysed in under four days using a commercial cloud cluster of 32,000 preemptable cores.
biorxiv bioinformatics 100-200-users 2016A simple proposal for the publication of journal citation distributions, bioRxiv, 2016-07-06
AbstractAlthough the Journal Impact Factor (JIF) is widely acknowledged to be a poor indicator of the quality of individual papers, it is used routinely to evaluate research and researchers. Here, we present a simple method for generating the citation distributions that underlie JIFs. Application of this straightforward protocol reveals the full extent of the skew of these distributions and the variation in citations received by published papers that is characteristic of all scientific journals. Although there are differences among journals across the spectrum of JIFs, the citation distributions overlap extensively, demonstrating that the citation performance of individual papers cannot be inferred from the JIF. We propose that this methodology be adopted by all journals as a move to greater transparency, one that should help to refocus attention on individual pieces of work and counter the inappropriate usage of JIFs during the process of research assessment.
biorxiv scientific-communication-and-education 500+-users 2016