Privacy-preserving generative deep neural networks support clinical data sharing, bioRxiv, 2017-07-06
AbstractBackgroundData sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier.Methods and ResultsUsing pairs of deep neural networks, we generated simulated, synthetic “participants” that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.ConclusionsDeep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.
biorxiv bioinformatics 200-500-users 2017“Unexpected mutations after CRISPR-Cas9 editing in vivo” are most likely pre-existing sequence variants and not nuclease-induced mutations, bioRxiv, 2017-07-06
Schaefer et al. recently advanced the provocative conclusion that CRISPR-Cas9 nuclease can induce off-target alterations at genomic loci that do not resemble the intended on-target site.1 Using high-coverage whole genome sequencing (WGS), these authors reported finding SNPs and indels in two CRISPR-Cas9-treated mice that were not present in a single untreated control mouse. On the basis of this association, Schaefer et al. concluded that these sequence variants were caused by CRISPR-Cas9. This new proposed CRISPR-Cas9 off-target activity runs contrary to previously published work2–8 and, if the authors are correct, could have profound implications for research and therapeutic applications. Here, we demonstrate that the simplest interpretation of Schaefer et al.’s data is that the two CRISPR-Cas9-treated mice are actually more closely related genetically to each other than to the control mouse. This strongly suggests that the so-called “unexpected mutations” simply represent SNPs and indels shared in common by these mice prior to nuclease treatment. In addition, given the genomic and sequence distribution profiles of these variants, we show that it is challenging to explain how CRISPR-Cas9 might be expected to induce such changes. Finally, we argue that the lack of appropriate controls in Schaefer et al.’s experimental design precludes assignment of causality to CRISPR-Cas9. Given these substantial issues, we urge Schaefer et al. to revise or re-state the original conclusions of their published work so as to avoid leaving misleading and unsupported statements to persist in the literature.
biorxiv molecular-biology 100-200-users 2017CRISPRCas9-APEX-mediated proximity labeling enables discovery of proteins associated with a predefined genomic locus in living cells, bioRxiv, 2017-07-05
AbstractThe activation or repression of a gene’s expression is primarily controlled by changes in the proteins that occupy its regulatory elements. The most common method to identify proteins associated with genomic loci is chromatin immunoprecipitation (ChIP). While having greatly advanced our understanding of gene expression regulation, ChIP requires specific, high quality, IP-competent antibodies against nominated proteins, which can limit its utility and scope for discovery. Thus, a method able to discover and identify proteins associated with a particular genomic locus within the native cellular context would be extremely valuable. Here, we present a novel technology combining recent advances in chemical biology, genome targeting, and quantitative mass spectrometry to develop genomic locus proteomics, a method able to identify proteins which occupy a specific genomic locus.
biorxiv biochemistry 200-500-users 2017Single nucleus analysis of the chromatin landscape in mouse forebrain development, bioRxiv, 2017-07-05
ABSTRACTGenome-wide analysis of chromatin accessibility in primary tissues has uncovered millions of candidate regulatory sequences in the human and mouse genomes1–4. However, the heterogeneity of biological samples used in previous studies has prevented a precise understanding of the dynamic chromatin landscape in specific cell types. Here, we show that analysis of the transposase-accessible-chromatin in single nuclei isolated from frozen tissue samples can resolve cellular heterogeneity and delineate transcriptional regulatory sequences in the constituent cell types. Our strategy is based on a combinatorial barcoding assisted single cell assay for transposase-accessible chromatin5 and is optimized for nuclei from flash-frozen primary tissue samples (snATAC-seq). We used this method to examine the mouse forebrain at seven development stages and in adults. From snATAC-seq profiles of more than 15,000 high quality nuclei, we identify 20 distinct cell populations corresponding to major neuronal and non-neuronal cell-types in foetal and adult forebrains. We further define cell-type specific cis regulatory sequences and infer potential master transcriptional regulators of each cell population. Our results demonstrate the feasibility of a general approach for identifying cell-type-specific cis regulatory sequences in heterogeneous tissue samples, and provide a rich resource for understanding forebrain development in mammals.
biorxiv genomics 0-100-users 2017The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum, bioRxiv, 2017-07-04
AbstractCommon bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall size of more than 15 billion bases. Multiple past attempts to assemble the genome have failed. Here we report the first successful assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15,344,693,583 bases and has a weighted average (N50) contig size of of 232,659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4,179,762,575 bp of T. aestivum that correspond to its D genome components.
biorxiv genomics 200-500-users 2017NanoJ-SQUIRREL quantitative mapping and minimisation of super-resolution optical imaging artefacts, bioRxiv, 2017-07-03
Most super-resolution microscopy methods depend on steps that contribute to the formation of image artefacts. Here we present NanoJ-SQUIRREL, an ImageJ-based analytical approach providing a quantitative assessment of super-resolution image quality. By comparing diffraction-limited images and super-resolution equivalents of the same focal volume, this approach generates a quantitative map of super-resolution defects, as well as methods for their correction. To illustrate its broad applicability to super-resolution approaches we apply our method to Localization Microscopy, STED and SIM images of a variety of in-cell structures including microtubules, poxviruses, neuronal actin rings and clathrin coated pits. We particularly focus on single-molecule localisation microscopy, where super-resolution reconstructions often feature imperfections not present in the original data. By showing the quantitative evolution of data quality over these varied sample preparation, acquisition and super-resolution methods we display the potential of NanoJ-SQUIRREL to guide optimization of superresolution imaging parameters.
biorxiv biophysics 100-200-users 2017