0-100-users | audiences

Massively multiplex single-cell Hi-C, bioRxiv, 2016-07-24

AbstractWe present combinatorial single cell Hi-C, a novel method that leverages combinatorial cellular indexing to measure chromosome conformation in large numbers of single cells. In this proof-of-concept, we generate and sequence combinatorial single cell Hi-C libraries for two mouse and four human cell types, comprising a total of 9,316 single cells across 5 experiments. We demonstrate the utility of single-cell Hi-C data in separating different cell types, identify previously uncharacterized cell-to-cell heterogeneity in the conformational properties of mammalian chromosomes, and demonstrate that combinatorial indexing is a generalizable molecular strategy for single-cell genomics.

biorxiv genomics 0-100-users 2016

CRISPR-Cas9 mediated mutagenesis of a DMR6 ortholog in tomato confers broad-spectrum disease resistance, bioRxiv, 2016-07-21

AbstractPathogenic microbes are responsible for severe production losses in crops worldwide. The use of disease resistant crop varieties can be a sustainable approach to meet the food demand of the world’s growing population. However, classical plant breeding is usually laborious and time-consuming, thus hampering efficient improvement of many crops. With the advent of genome editing technologies, in particular the CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-Cas9) system, we are now able to introduce improved crop traits in a rapid and efficient manner. In this work, we genome edited durable disease resistance in tomato by modifying a specific gene associated with disease resistance. Recently, it was demonstrated that inactivation of a single gene called DMR6 (downy mildew resistance 6) confers resistance to several pathogens in Arabidopsis thaliana. This gene is specifically up-regulated during pathogen infection, and mutations in the dmr6 gene results in increased salicylic acid levels. The tomato SlDMR6-1 orthologue Solyc03g080190 is also up-regulated during infection by Pseudomonas syringae pv. tomato and Phytophthora capsici. Using the CRISPR-Cas9 system, we generated tomato plants with small deletions in the SlDMR6-1 gene that result in frameshift and premature truncation of the protein. Remarkably, these mutants do not have significant detrimental effects in terms of growth and development under greenhouse conditions and show disease resistance against different pathogens, including P. syringae, P. capsici and Xanthomonas spp.

biorxiv plant-biology 0-100-users 2016

H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer, bioRxiv, 2016-07-18

A quantitative model to genetically interpret the histology in whole microscopy slide images is desirable to guide downstream immuno-histochemistry, genomics, and precision medicine. We constructed a statistical model that predicts whether or not SPOP is mutated in prostate cancer, given only the digital whole slide after standard hematoxylin and eosin [H&E] staining. Using a TCGA cohort of 177 prostate cancer patients where 20 had mutant SPOP, we trained multiple ensembles of residual networks, accurately distinguishing SPOP mutant from SPOP non-mutant patients (test AUROC=0.74, p=0.0007 Fisher’s Exact Test). We further validated our full metaensemble classifier on an independent test cohort from MSK-IMPACT of 152 patients where 19 had mutant SPOP. Mutants and non-mutants were accurately distinguished despite TCGA slides being frozen sections and MSK-IMPACT slides being formalin-fixed paraffin-embedded sections (AUROC=0.86, p=0.0038). Moreover, we scanned an additional 36 MSK-IMPACT patients having mutant SPOP, trained on this expanded MSK-IMPACT cohort (test AUROC=0.75, p=0.0002), tested on the TCGA cohort (AUROC=0.64, p=0.0306), and again accurately distinguished mutants from non-mutants using the same pipeline. Importantly, our method demonstrates tractable deep learning in this “small data” setting of 20-55 positive examples and quantifies each prediction’s uncertainty with confidence intervals. To our knowledge, this is the first statistical model to predict a genetic mutation in cancer directly from the patient’s digitized H&E-stained whole microscopy slide. Moreover, this is the first time quantitative features learned from patient genetics and histology have been used for content-based image retrieval, finding similar patients for a given patient where the histology appears to share the same genetic driver of disease i.e. SPOP mutation (p=0.0241 Kost’s Method), and finding similar patients for a given patient that does not have have that driver mutation (p=0.0170 Kost’s Method).Significance StatementThis is the first pipeline predicting gene mutation probability in cancer from digitized H&E-stained microscopy slides. To predict whether or not the speckle-type POZ protein [SPOP] gene is mutated in prostate cancer, the pipeline (i) identifies diagnostically salient slide regions, (ii) identifies the salient region having the dominant tumor, and (iii) trains ensembles of binary classifiers that together predict a confidence interval of mutation probability. Through deep learning on small datasets, this enables automated histologic diagnoses based on probabilities of underlying molecular aberrations and finds histologically similar patients by learned genetic-histologic relationships.Conception, Writing AJS, TJF. Algorithms, Learning, CBIR AJS. Analysis AJS, MAR, TJF. Supervision MAR, TJF.

biorxiv pathology 0-100-users 2016

Adapterama I Universal stubs and primers for 384 unique dual-indexed or 147,456 combinatorially-indexed Illumina libraries (iTru & iNext), bioRxiv, 2016-06-16

AbstractNext-generation DNA sequencing (NGS) offers many benefits, but major factors limiting NGS include reducing costs of 1) start-up (i.e., doing NGS for the first time); 2) buy-in (i.e., getting the smallest possible amount of data from a run); and 3) sample preparation. Reducing sample preparation costs is commonly addressed, but start-up and buy-in costs are rarely addressed. We present dual-indexing systems to address all three of these issues. By breaking the library construction process into universal, re-usable, combinatorial components, we reduce all costs, while increasing the number of samples and the variety of library types that can be combined within runs. We accomplish this by extending the Illumina TruSeq dual-indexing approach to 768 (384 + 384) indexed primers that produce 384 unique dual-indexes or 147,456 (384 × 384) unique combinations. We maintain eight nucleotide indexes, with many that are compatible with Illumina index sequences. We synthesized these indexing primers, purifying them with only standard desalting and placing small aliquots in replicate plates. In qPCR validation tests, 206 of 208 primers tested passed (99% success). We then created hundreds of libraries in various scenarios. Our approach reduces start-up and per-sample costs by requiring only one universal adapter that works with indexed PCR primers to uniquely identify samples. Our approach reduces buy-in costs because 1) relatively few oligonucleotides are needed to produce a large number of indexed libraries; and 2) the large number of possible primers allows researchers to use unique primer sets for different projects, which facilitates pooling of samples during sequencing. Our libraries make use of standard Illumina sequencing primers and index sequence length and are demultiplexed with standard Illumina software, thereby minimizing customization headaches. In subsequent Adapterama papers, we use these same primers with different adapter stubs to construct amplicon and restriction-site associated DNA libraries, but their use can be expanded to any type of library sequenced on Illumina platforms.

biorxiv genomics 0-100-users 2016

Vibrio natriegens, a new genomic powerhouse, bioRxiv, 2016-06-13

Recombinant DNA technology has revolutionized biomedical research with continual innovations advancing the speed and throughput of molecular biology. Nearly all these tools, however, are reliant on Escherichia coli as a host organism, and its lengthy growth rate increasingly dominates experimental time. Here we report the development of Vibrio natriegens, a free-living bacteria with the fastest generation time known, into a genetically tractable host organism. We systematically characterize its growth properties to establish basic laboratory culturing conditions. We provide the first complete Vibrio natriegens genome, consisting of two chromosomes of 3,248,023 bp and 1,927,310 bp that together encode 4,578 open reading frames. We reveal genetic tools and techniques for working with Vibrio natriegens. These foundational resources will usher in an era of advanced genomics to accelerate biological, biotechnological, and medical discoveries.

biorxiv genomics 0-100-users 2016

Differential analysis of RNA-Seq incorporating quantification uncertainty, bioRxiv, 2016-06-11

We describe a novel method for the differential analysis of RNA-Seq data that utilizes bootstrapping in conjunction with response error linear modeling to decouple biological variance from inferential variance. The method is implemented in an interactive shiny app called sleuth that utilizes kallisto quantifications and bootstraps for fast and accurate analysis of RNA-Seq experiments.

biorxiv bioinformatics 0-100-users 2016