CRISPR-Cas9 mediated mutagenesis of a DMR6 ortholog in tomato confers broad-spectrum disease resistance, bioRxiv, 2016-07-21

AbstractPathogenic microbes are responsible for severe production losses in crops worldwide. The use of disease resistant crop varieties can be a sustainable approach to meet the food demand of the world’s growing population. However, classical plant breeding is usually laborious and time-consuming, thus hampering efficient improvement of many crops. With the advent of genome editing technologies, in particular the CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-Cas9) system, we are now able to introduce improved crop traits in a rapid and efficient manner. In this work, we genome edited durable disease resistance in tomato by modifying a specific gene associated with disease resistance. Recently, it was demonstrated that inactivation of a single gene called DMR6 (downy mildew resistance 6) confers resistance to several pathogens in Arabidopsis thaliana. This gene is specifically up-regulated during pathogen infection, and mutations in the dmr6 gene results in increased salicylic acid levels. The tomato SlDMR6-1 orthologue Solyc03g080190 is also up-regulated during infection by Pseudomonas syringae pv. tomato and Phytophthora capsici. Using the CRISPR-Cas9 system, we generated tomato plants with small deletions in the SlDMR6-1 gene that result in frameshift and premature truncation of the protein. Remarkably, these mutants do not have significant detrimental effects in terms of growth and development under greenhouse conditions and show disease resistance against different pathogens, including P. syringae, P. capsici and Xanthomonas spp.

biorxiv plant-biology 0-100-users 2016

H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer, bioRxiv, 2016-07-18

A quantitative model to genetically interpret the histology in whole microscopy slide images is desirable to guide downstream immuno-histochemistry, genomics, and precision medicine. We constructed a statistical model that predicts whether or not SPOP is mutated in prostate cancer, given only the digital whole slide after standard hematoxylin and eosin [H&E] staining. Using a TCGA cohort of 177 prostate cancer patients where 20 had mutant SPOP, we trained multiple ensembles of residual networks, accurately distinguishing SPOP mutant from SPOP non-mutant patients (test AUROC=0.74, p=0.0007 Fisher’s Exact Test). We further validated our full metaensemble classifier on an independent test cohort from MSK-IMPACT of 152 patients where 19 had mutant SPOP. Mutants and non-mutants were accurately distinguished despite TCGA slides being frozen sections and MSK-IMPACT slides being formalin-fixed paraffin-embedded sections (AUROC=0.86, p=0.0038). Moreover, we scanned an additional 36 MSK-IMPACT patients having mutant SPOP, trained on this expanded MSK-IMPACT cohort (test AUROC=0.75, p=0.0002), tested on the TCGA cohort (AUROC=0.64, p=0.0306), and again accurately distinguished mutants from non-mutants using the same pipeline. Importantly, our method demonstrates tractable deep learning in this “small data” setting of 20-55 positive examples and quantifies each prediction’s uncertainty with confidence intervals. To our knowledge, this is the first statistical model to predict a genetic mutation in cancer directly from the patient’s digitized H&E-stained whole microscopy slide. Moreover, this is the first time quantitative features learned from patient genetics and histology have been used for content-based image retrieval, finding similar patients for a given patient where the histology appears to share the same genetic driver of disease i.e. SPOP mutation (p=0.0241 Kost’s Method), and finding similar patients for a given patient that does not have have that driver mutation (p=0.0170 Kost’s Method).Significance StatementThis is the first pipeline predicting gene mutation probability in cancer from digitized H&E-stained microscopy slides. To predict whether or not the speckle-type POZ protein [SPOP] gene is mutated in prostate cancer, the pipeline (i) identifies diagnostically salient slide regions, (ii) identifies the salient region having the dominant tumor, and (iii) trains ensembles of binary classifiers that together predict a confidence interval of mutation probability. Through deep learning on small datasets, this enables automated histologic diagnoses based on probabilities of underlying molecular aberrations and finds histologically similar patients by learned genetic-histologic relationships.Conception, Writing AJS, TJF. Algorithms, Learning, CBIR AJS. Analysis AJS, MAR, TJF. Supervision MAR, TJF.

biorxiv pathology 0-100-users 2016

Deep Sequencing of 10,000 Human Genomes, bioRxiv, 2016-07-02

AbstractWe report on the sequencing of 10,545 human genomes at 30-40x coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present thedistribution of over 150 million single nucleotide variants in the coding and non-coding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries in average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct highresolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.Significance statementDeclining sequencing costs and new large-scale initiatives towards personalized medicine are driving a massive expansion in the number of human genomes being sequenced. Therefore, there is an urgent need to define quality standards for clinical use. This includes deep coverage and sequencing accuracy of an individual’s genome, rather than aggregated coverage of data across a cohort or population. Our work represents the largest effort to date in sequencing human genomes at deep coverage with these new standards. This study identifies over 150 million human variants, a majority of them rare and unknown. Moreover, these data identify sites in the genome that are highly intolerant to variation - possibly essential for life or health. We conclude that high coverage genome sequencing provides accurate detail on human variation for discovery and for clinical applications.

biorxiv genomics 200-500-users 2016

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo