Privacy-preserving generative deep neural networks support clinical data sharing, bioRxiv, 2017-07-06
AbstractBackgroundData sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier.Methods and ResultsUsing pairs of deep neural networks, we generated simulated, synthetic “participants” that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.ConclusionsDeep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.
biorxiv bioinformatics 200-500-users 2017CRISPRCas9-APEX-mediated proximity labeling enables discovery of proteins associated with a predefined genomic locus in living cells, bioRxiv, 2017-07-05
AbstractThe activation or repression of a gene’s expression is primarily controlled by changes in the proteins that occupy its regulatory elements. The most common method to identify proteins associated with genomic loci is chromatin immunoprecipitation (ChIP). While having greatly advanced our understanding of gene expression regulation, ChIP requires specific, high quality, IP-competent antibodies against nominated proteins, which can limit its utility and scope for discovery. Thus, a method able to discover and identify proteins associated with a particular genomic locus within the native cellular context would be extremely valuable. Here, we present a novel technology combining recent advances in chemical biology, genome targeting, and quantitative mass spectrometry to develop genomic locus proteomics, a method able to identify proteins which occupy a specific genomic locus.
biorxiv biochemistry 200-500-users 2017The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum, bioRxiv, 2017-07-04
AbstractCommon bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall size of more than 15 billion bases. Multiple past attempts to assemble the genome have failed. Here we report the first successful assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15,344,693,583 bases and has a weighted average (N50) contig size of of 232,659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4,179,762,575 bp of T. aestivum that correspond to its D genome components.
biorxiv genomics 200-500-users 2017Corrigendum and follow-up Whole genome sequencing of multiple CRISPR-edited mouse lines suggests no excess mutations, bioRxiv, 2017-06-24
Our previous publication suggested CRISPR-Cas9 editing at the zygotic stage might unexpectedly introduce a multitude of subtle but unintended mutations, an interpretation that not surprisingly raised numerous questions. The key issue is that since parental lines were not available, might the reported variants have been inherited? To expand upon the limited available whole genome data on whether CRISPR-edited mice show more genetic variation, whole-genome sequencing was performed on two other mouse lines that had undergone a CRISPR-editing procedure. Again, parents were not available for either the Capn5 nor Fblim1 CRISPR-edited mouse lines, so strain controls were examined. Additionally, we also include verification of variants detected in the initial mouse line. Taken together, these whole-genome-sequencing-level results support the idea that in specific cases, CRISPR-Cas9 editing can precisely edit the genome at the organismal level and may not introduce numerous, unintended, off-target mutations.
biorxiv bioengineering 200-500-users 2017A Guide to Robust Statistical Methods in Neuroscience, bioRxiv, 2017-06-21
ABSTRACTThere is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of neuroscience data. These new techniques effectively deal with four insights into when and why conventional methods can be unsatisfactory. But for the non-statistician, the vast array of new and improved techniques for comparing groups and studying associations can seem daunting, simply because there are so many new methods that are now available. The paper briefly reviews when and why conventional methods can have relatively low power and yield misleading results. The main goal is to suggest some general guidelines regarding when, how and why certain modern techniques might be used.
biorxiv neuroscience 200-500-users 2017Environmental factors dominate over host genetics in shaping human gut microbiota composition, bioRxiv, 2017-06-17
AbstractHuman gut microbiome composition is shaped by multiple host intrinsic and extrinsic factors, but the relative contribution of host genetic compared to environmental factors remains elusive. Here, we genotyped a cohort of 696 healthy individuals from several distinct ancestral origins and a relatively common environment, and demonstrate that there is no statistically significant association between microbiome composition and ethnicity, single nucleotide polymorphisms (SNPs), or overall genetic similarity, and that only 5 of 211 (2.4%) previously reported microbiome-SNP associations replicate in our cohort. In contrast, we find similarities in the microbiome composition of genetically unrelated individuals who share a household. We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics. Consistent with our finding that microbiome and host genetics are largely independent, we find significant biome-explainability levels of 16-33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption. We further show that several human phenotypes can be predicted substantially more accurately when adding microbiome data to host genetics data, and that the contribution of both data sources to prediction accuracy is largely additive. Overall, our results suggest that human microbiome composition is dominated by environmental factors rather than by host genetics.
biorxiv genetics 200-500-users 2017