Toward machine-guided design of proteins, bioRxiv, 2018-06-02
AbstractProteins—molecular machines that underpin all biological life—are of significant therapeutic and industrial value. Directed evolution is a high-throughput experimental approach for improving protein function, but has difficulty escaping local maxima in the fitness landscape. Here, we investigate how supervised learning in a closed loop with DNA synthesis and high-throughput screening can be used to improve protein design. Using the green fluorescent protein (GFP) as an illustrative example, we demonstrate the opportunities and challenges of generating training datasets conducive to selecting strongly generalizing models. With prospectively designed wet lab experiments, we then validate that these models can generalize to unseen regions of the fitness landscape, even when constrained to explore combinations of non-trivial mutations. Taken together, this suggests a hybrid optimization strategy for protein design in which a predictive model is used to explore difficult-to-access but promising regions of the fitness landscape that directed evolution can then exploit at scale.
biorxiv synthetic-biology 100-200-users 2018Optimization of Golden Gate assembly through application of ligation sequence-dependent fidelity and bias profiling, bioRxiv, 2018-05-15
ABSTRACTModern synthetic biology depends on the manufacture of large DNA constructs from libraries of genes, regulatory elements or other genetic parts. Type IIS restriction enzyme-dependent DNA assembly methods (e.g., Golden Gate) enable rapid one-pot, ordered, multi-fragment DNA assembly, facilitating the generation of high-complexity constructs. The order of assembly of genetic parts is determined by the ligation of flanking Watson-Crick base-paired overhangs. The ligation of mismatched overhangs leads to erroneous assembly, and the need to avoid such pairings has typically been accomplished by using small sets of empirically vetted junction pairs, limiting the number of parts that can be joined in a single reaction. Here, we report the use of a comprehensive method for profiling end-joining ligation fidelity and bias to predict highly accurate sets of connections for ligation-based DNA assembly methods. This data set allows quantification of sequence-dependent ligation efficiency and identification of mismatch-prone pairings. The ligation profile accurately predicted junction fidelity in ten-fragment Golden Gate assembly reactions, and enabled efficient assembly of a lac cassette from up to 24-fragments in a single reaction. Application of the ligation fidelity profile to inform choice of junctions thus enables highly flexible assembly design, with >20 fragments in a single reaction.
biorxiv synthetic-biology 0-100-users 2018Human 5′ UTR design and variant effect prediction from a massively parallel translation assay, bioRxiv, 2018-04-29
Predicting the impact of cis-regulatory sequence on gene expression is a foundational challenge for biology. We combine polysome profiling of hundreds of thousands of randomized 5′ UTRs with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately target specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,000 truncated human 5′ UTRs and 3,577 naturally-occurring variants and show that the model accurately predicts ribosome loading of these sequences. Finally, we provide evidence of 47 SNVs associated with human diseases that cause a significant change in ribosome loading and thus a plausible molecular basis for disease.
biorxiv synthetic-biology 100-200-users 2018Marionette E. coli containing 12 highly-optimized small molecule sensors, bioRxiv, 2018-03-21
Cellular processes are carried out by many interacting genes and their study and optimization requires multiple levers by which they can be independently controlled. The most common method is via a genetically-encoded sensor that responds to a small molecule (an “inducible system”). However, these sensors are often suboptimal, exhibiting high background expression and low dynamic range. Further, using multiple sensors in one cell is limited by cross-talk and the taxing of cellular resources. Here, we have developed a directed evolution strategy to simultaneously select for less background, high dynamic range, increased sensitivity, and low crosstalk. Libraries of the regulatory protein and output promoter are built based on random and rationally-guided mutations. This is applied to generate a set of 12 high-performance sensors, which exhibit >100-fold induction with low background and cross-reactivity. These are combined to build a single “sensor array” and inserted into the genomes of E. coli MG1655 (wild-type), DH10B (cloning), and BL21 (protein expression). These “Marionette” strains allow for the independent control of gene expression using 2,4-diacetylphophloroglucinol (DAPG), cuminic acid (Cuma), 3-oxohexanoyl-homoserine lactone (OC6), vanillic acid (Van), isopropyl β-D-1-thiogalactopyranoside (IPTG), anhydrotetracycline (aTc), L-arabinose (Ara), choline chloride (Cho), naringenin (Nar), 3,4-dihydroxybenzoic acid (DHBA), sodium salicylate (Sal), and 3-hydroxytetradecanoyl-homoserine lactone (OHC14).
biorxiv synthetic-biology 0-100-users 2018Prokaryotic nanocompartments form synthetic organelles in a eukaryote, bioRxiv, 2018-01-07
AbstractCompartmentalization of proteins into organelles is a promising strategy for enhancing the productivity of engineered eukaryotic organisms. However, approaches that co-opt endogenous organelles may be limited by the potential for unwanted crosstalk and disruption of native metabolic functions. Here, we present the construction of synthetic non-endogenous organelles in the eukaryotic yeast Saccharomyces cerevisiae, based on the prokaryotic family of self-assembling proteins known as encapsulins. We establish that encapsulins self-assemble to form nanoscale compartments in yeast, and that heterologous proteins can be selectively targeted for compartmentalization. Housing destabilized proteins within encapsulin compartments affords protection against proteolytic degradation in vivo, while the interaction between split protein components is enhanced upon co-localization within the compartment interior. Furthermore, encapsulin compartments can support enzymatic catalysis, with substrate turnover observed for an encapsulated yeast enzyme. Encapsulin compartments therefore represent a modular platform, orthogonal to existing organelles, for programming synthetic compartmentalization in eukaryotes.
biorxiv synthetic-biology 0-100-users 2018Current CRISPR gene drive systems are likely to be highly invasive in wild populations, bioRxiv, 2017-11-21
AbstractRecent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
biorxiv synthetic-biology 100-200-users 2017