Enzymatic DNA synthesis for digital information storage, bioRxiv, 2018-06-16
AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.
biorxiv synthetic-biology 100-200-users 2018Live Mouse Tracker real-time behavioral analysis of groups of mice, bioRxiv, 2018-06-14
Preclinical studies of psychiatric disorders require the use of animal models to investigate the impact of environmental factors or genetic mutations on complex traits such as decision-making and social interactions. Here, we present a real-time method for behavior analysis of mice housed in groups that couples computer vision, machine learning and Triggered-RFID identification to track and monitor animals over several days in enriched environments. The system extracts a thorough list of individual and collective behavioral traits and provides a unique phenotypic profile for each animal. On mouse models, we study the impact of mutations of genes Shank2 and Shank3 involved in autism. Characterization and integration of data from behavioral profiles of mutated female mice reveals distinctive activity levels and involvement in complex social configuration.
biorxiv animal-behavior-and-cognition 100-200-users 2018Real-time cryo-EM data pre-processing with Warp, bioRxiv, 2018-06-14
The acquisition of cryo-electron microscopy (cryo-EM) data from biological specimens is currently largely uncoupled from subsequent data evaluation, correction and processing. Therefore, the acquisition strategy is difficult to optimize during data collection, often leading to suboptimal microscope usage and disappointing results. Here we provide Warp, a software for real-time evaluation, correction, and processing of cryo-EM data during their acquisition. Warp evaluates and monitors key parameters for each recorded micrograph or tomographic tilt series in real time. Warp also rapidly corrects micrographs for global and local motion, and estimates the local defocus with the use of novel algorithms. The software further includes a deep learning-based particle picking algorithm that rivals human accuracy to make the pre-processing pipeline truly automated. The output from Warp can be directly fed into established tools for particle classification and 3D image reconstruction. In a benchmarking study we show that Warp automatically processed a published cryo-EM data set for influenza virus hemagglutinin, leading to an improvement of the nominal resolution from 3.9 Å to 3.2 Å. Warp is easy to install, computationally inexpensive, and has an intuitive and streamlined user interface.
biorxiv biophysics 0-100-users 2018Adversarial childhood events are associated with Sudden Infant Death Syndrome (SIDS) an ecological study, bioRxiv, 2018-06-07
AbstractSudden Infant Death Syndrome (SIDS) is the most common cause of postneonatal infant death. The allostatic load hypothesis posits that SIDS is the result of perinatal cumulative painful, stressful, or traumatic exposures that tax neonatal regulatory systems. To test it, we explored the relationships between SIDS and two common stressors, male neonatal circumcision (MNC) and prematurity, using latitudinal data from 15 countries and over 40 US states during the years 1999-2016. We used linear regression analyses and likelihood ratio tests to calculate the association between SIDS and the stressors. SIDS prevalence was significantly and positively correlated with MNC and prematurity rates. MNC explained 14.2% of the variability of SIDS’s male bias in the US, reminiscent of the Jewish myth of Lilith, the killer of infant males. Combined, the stressors increased the likelihood of SIDS. Ecological analyses are useful to generate hypotheses but cannot provide strong evidence of causality. Biological plausibility is provided by a growing body of experimental and clinical evidence linking adversary preterm and early-life events with SIDS. Together with historical evidence, our findings emphasize the necessity of cohort studies that consider these environmental stressors with the aim of improving the identification of at-risk infants and reducing infant mortality.
biorxiv pathology 100-200-users 2018CalmAn An open source tool for scalable Calcium Imaging data Analysis, bioRxiv, 2018-06-05
AbstractAdvances in fluorescence microscopy enable monitoring larger brain areas in-vivo with finer time resolution. The resulting data rates require reproducible analysis pipelines that are reliable, fully automated, and scalable to datasets generated over the course of months. Here we present CaImAn, an open-source library for calcium imaging data analysis. CaImAn provides automatic and scalable methods to address problems common to pre-processing, including motion correction, neural activity identification, and registration across different sessions of data collection. It does this while requiring minimal user intervention, with good performance on computers ranging from laptops to high-performance computing clusters. CaImAn is suitable for two-photon and one-photon imaging, and also enables real-time analysis on streaming data. To benchmark the performance of CaImAn we collected a corpus of ground truth annotations from multiple labelers on nine mouse two-photon datasets. We demonstrate that CaImAn achieves near-human performance in detecting locations of active neurons.
biorxiv neuroscience 0-100-users 2018Toward machine-guided design of proteins, bioRxiv, 2018-06-02
AbstractProteins—molecular machines that underpin all biological life—are of significant therapeutic and industrial value. Directed evolution is a high-throughput experimental approach for improving protein function, but has difficulty escaping local maxima in the fitness landscape. Here, we investigate how supervised learning in a closed loop with DNA synthesis and high-throughput screening can be used to improve protein design. Using the green fluorescent protein (GFP) as an illustrative example, we demonstrate the opportunities and challenges of generating training datasets conducive to selecting strongly generalizing models. With prospectively designed wet lab experiments, we then validate that these models can generalize to unseen regions of the fitness landscape, even when constrained to explore combinations of non-trivial mutations. Taken together, this suggests a hybrid optimization strategy for protein design in which a predictive model is used to explore difficult-to-access but promising regions of the fitness landscape that directed evolution can then exploit at scale.
biorxiv synthetic-biology 100-200-users 2018