Enhanced proofreading governs CRISPR-Cas9 targeting accuracy, bioRxiv, 2017-07-07
The RNA-guided CRISPR-Cas9 nuclease from Streptococcus pyogenes (SpCas9) has been widely repurposed for genome editing1-4. High-fidelity (SpCas9-HF1) and enhanced specificity (eSpCas9(1.1)) variants exhibit substantially reduced off-target cleavage in human cells, but the mechanism of target discrimination and the potential to further improve fidelity were unknown5-9. Using single-molecule Förster resonance energy transfer (smFRET) experiments, we show that both SpCas9-HF1 and eSpCas9(1.1) are trapped in an inactive state10 when bound to mismatched targets. We find that a non-catalytic domain within Cas9, REC3, recognizes target mismatches and governs the HNH nuclease to regulate overall catalytic competence. Exploiting this observation, we identified residues within REC3 involved in mismatch sensing and designed a new hyper-accurate Cas9 variant (HypaCas9) that retains robust on-target activity in human cells. These results offer a more comprehensive model to rationalize and modify the balance between target recognition and nuclease activation for precision genome editing.
biorxiv biochemistry 100-200-users 2017Robust and Bright Genetically Encoded Fluorescent Markers for Highlighting Structures and Compartments in Mammalian Cells, bioRxiv, 2017-07-07
To increase our understanding of cells, there is a need for specific markers to identify biomolecules, cellular structures and compartments. One type of markers comprises genetically encoded fluorescent probes that are linked with protein domains, peptides andor signal sequences. These markers are encoded on a plasmid and they allow straightforward, convenient labeling of cultured mammalian cells by introducing the plasmid into the cells. Ideally, the fluorescent marker combines favorable spectroscopic properties (brightness, photostability) with specific labeling of the structure or compartment of interest. Here, we report on our ongoing efforts to generate robust and bright genetically encoded fluorescent markers for highlighting structures and compartments in living cells.
biorxiv cell-biology 200-500-users 2017Privacy-preserving generative deep neural networks support clinical data sharing, bioRxiv, 2017-07-06
AbstractBackgroundData sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier.Methods and ResultsUsing pairs of deep neural networks, we generated simulated, synthetic “participants” that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.ConclusionsDeep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.
biorxiv bioinformatics 200-500-users 2017“Unexpected mutations after CRISPR-Cas9 editing in vivo” are most likely pre-existing sequence variants and not nuclease-induced mutations, bioRxiv, 2017-07-06
Schaefer et al. recently advanced the provocative conclusion that CRISPR-Cas9 nuclease can induce off-target alterations at genomic loci that do not resemble the intended on-target site.1 Using high-coverage whole genome sequencing (WGS), these authors reported finding SNPs and indels in two CRISPR-Cas9-treated mice that were not present in a single untreated control mouse. On the basis of this association, Schaefer et al. concluded that these sequence variants were caused by CRISPR-Cas9. This new proposed CRISPR-Cas9 off-target activity runs contrary to previously published work2–8 and, if the authors are correct, could have profound implications for research and therapeutic applications. Here, we demonstrate that the simplest interpretation of Schaefer et al.’s data is that the two CRISPR-Cas9-treated mice are actually more closely related genetically to each other than to the control mouse. This strongly suggests that the so-called “unexpected mutations” simply represent SNPs and indels shared in common by these mice prior to nuclease treatment. In addition, given the genomic and sequence distribution profiles of these variants, we show that it is challenging to explain how CRISPR-Cas9 might be expected to induce such changes. Finally, we argue that the lack of appropriate controls in Schaefer et al.’s experimental design precludes assignment of causality to CRISPR-Cas9. Given these substantial issues, we urge Schaefer et al. to revise or re-state the original conclusions of their published work so as to avoid leaving misleading and unsupported statements to persist in the literature.
biorxiv molecular-biology 100-200-users 2017CRISPRCas9-APEX-mediated proximity labeling enables discovery of proteins associated with a predefined genomic locus in living cells, bioRxiv, 2017-07-05
AbstractThe activation or repression of a gene’s expression is primarily controlled by changes in the proteins that occupy its regulatory elements. The most common method to identify proteins associated with genomic loci is chromatin immunoprecipitation (ChIP). While having greatly advanced our understanding of gene expression regulation, ChIP requires specific, high quality, IP-competent antibodies against nominated proteins, which can limit its utility and scope for discovery. Thus, a method able to discover and identify proteins associated with a particular genomic locus within the native cellular context would be extremely valuable. Here, we present a novel technology combining recent advances in chemical biology, genome targeting, and quantitative mass spectrometry to develop genomic locus proteomics, a method able to identify proteins which occupy a specific genomic locus.
biorxiv biochemistry 200-500-users 2017Single nucleus analysis of the chromatin landscape in mouse forebrain development, bioRxiv, 2017-07-05
ABSTRACTGenome-wide analysis of chromatin accessibility in primary tissues has uncovered millions of candidate regulatory sequences in the human and mouse genomes1–4. However, the heterogeneity of biological samples used in previous studies has prevented a precise understanding of the dynamic chromatin landscape in specific cell types. Here, we show that analysis of the transposase-accessible-chromatin in single nuclei isolated from frozen tissue samples can resolve cellular heterogeneity and delineate transcriptional regulatory sequences in the constituent cell types. Our strategy is based on a combinatorial barcoding assisted single cell assay for transposase-accessible chromatin5 and is optimized for nuclei from flash-frozen primary tissue samples (snATAC-seq). We used this method to examine the mouse forebrain at seven development stages and in adults. From snATAC-seq profiles of more than 15,000 high quality nuclei, we identify 20 distinct cell populations corresponding to major neuronal and non-neuronal cell-types in foetal and adult forebrains. We further define cell-type specific cis regulatory sequences and infer potential master transcriptional regulators of each cell population. Our results demonstrate the feasibility of a general approach for identifying cell-type-specific cis regulatory sequences in heterogeneous tissue samples, and provide a rich resource for understanding forebrain development in mammals.
biorxiv genomics 0-100-users 2017