Changes in gene expression shift and switch genetic interactions, bioRxiv, 2019-03-15
SummaryAn important goal in disease genetics and evolutionary biology is to understand how mutations combine to alter phenotypes and fitness. Non-additive interactions between mutations occur extensively and change across conditions, cell types, and species, making genetic prediction a difficult challenge. To understand the reasons for this, we reduced the problem to a minimal system where we combined mutations in a single protein performing a single function (a transcriptional repressor inhibiting a target gene). Even in this minimal system, a change in gene expression altered both the strength and type of genetic interactions. These seemingly complicated changes could, however, be predicted by a mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype. We show that similar changes will be observed for many genes. These results provide fundamental insights into genotype-phenotype maps and illustrate how changes in genetic interactions can be predicted using hierarchical mechanistic models.One sentence SummaryDeep mutagenesis of the lambda repressor reveals that changes in gene expression will alter the strength and direction of genetic interactions between mutations in many genes.Highlights<jatslist list-type=bullet><jatslist-item>Deep mutagenesis of the lambda repressor at two expression levels reveals extensive changes in mutational effects and genetic interactions<jatslist-item><jatslist-item>Genetic interactions can switch from positive (suppressive) to negative (enhancing) as the expression of a gene changes<jatslist-item><jatslist-item>A mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype accurately predicts changes in mutational effects and interactions<jatslist-item><jatslist-item>Changes in expression will alter mutational effects and interactions for many genes<jatslist-item><jatslist-item>For some genes, perfect mechanistic models will never be able to predict how mutations of known effect combine without measurements of intermediate phenotypes<jatslist-item>
biorxiv genetics 0-100-users 2019A systematic evaluation of the design, orientation, and sequence context dependencies of massively parallel reporter assays, bioRxiv, 2019-03-14
ABSTRACTMassively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. Although MPRAs have been applied to address diverse questions in gene regulation, there has been no systematic comparison of how differences in experimental design influence findings. Here, we screen a library of 2,440 sequences, representing candidate liver enhancers and controls, in HepG2 cells for regulatory activity using nine different approaches (including conventional episomal, STARR-seq, and lentiviral MPRA designs). We identify subtle but significant differences in the resulting measurements that correlate with epigenetic and sequence-level features. We also test this library in both orientations with respect to the promoter, validating en masse that enhancer activity is robustly independent of orientation. Finally, we develop and apply a novel method to assemble and functionally test libraries of the same putative enhancers as 192-mers, 354-mers, and 678-mers, and observe surprisingly large differences in functional activity. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements, and to a lesser degree the precise assay, influence MPRA results.
biorxiv genomics 0-100-users 2019Frequent birth of de novo genes in the compact yeast genome, bioRxiv, 2019-03-13
AbstractEvidence has accumulated that some genes originate directly from previously non-genic sequences, or de novo, rather than by the duplication or fusion of existing genes. However, how de novo genes emerge and eventually become functional is largely unknown. Here we perform the first study on de novo genes that uses transcriptomics data from eleven different yeast species, all grown identically in both rich media and in oxidative stress conditions. The genomes of these species are densely-packed with functional elements, leaving little room for the co-option of genomic sequences into new transcribed loci. Despite this, we find that at least 213 transcripts (~5%) have arisen de novo in the past 20 million years of evolution of baker’s yeast-or approximately 10 new transcripts every million years. Nearly half of the total newly expressed sequences are generated from regions in which both DNA strands are used as templates for transcription, explaining the apparent contradiction between the limited ‘empty’ genomic space and high rate of de novo gene birth. In addition, we find that 40% of these de novo transcripts are actively translated and that at least a fraction of the encoded proteins are likely to be under purifying selection. This study shows that even in very highly compact genomes, de novo transcripts are continuously generated and can give rise to new functional protein-coding genes.
biorxiv evolutionary-biology 0-100-users 2019Frequent birth ofde novogenes in the compact yeast genome, bioRxiv, 2019-03-13
AbstractEvidence has accumulated that some genes originate directly from previously non-genic sequences, orde novo, rather than by the duplication or fusion of existing genes. However, howde novogenes emerge and eventually become functional is largely unknown. Here we perform the first study onde novogenes that uses transcriptomics data from eleven different yeast species, all grown identically in both rich media and in oxidative stress conditions. The genomes of these species are densely-packed with functional elements, leaving little room for the co-option of genomic sequences into new transcribed loci. Despite this, we find that at least 213 transcripts (~5%) have arisende novoin the past 20 million years of evolution of baker’s yeast-or approximately 10 new transcripts every million years. Nearly half of the total newly expressed sequences are generated from regions in which both DNA strands are used as templates for transcription, explaining the apparent contradiction between the limited ‘empty’ genomic space and high rate ofde novogene birth. In addition, we find that 40% of thesede novotranscripts are actively translated and that at least a fraction of the encoded proteins are likely to be under purifying selection. This study shows that even in very highly compact genomes,de novotranscripts are continuously generated and can give rise to new functional protein-coding genes.
biorxiv evolutionary-biology 0-100-users 2019The genetic architecture of sporadic and recurrent miscarriage, bioRxiv, 2019-03-13
Miscarriage is a common complex trait that affects 10-25% of clinically confirmed pregnancies1,2. Here we present the first large-scale genetic association analyses with 69,118 cases from five different ancestries for sporadic miscarriage and 750 cases of European ancestry for recurrent miscarriage, and up to 359,469 female controls. We identify one genome-wide significant association on chromosome 13 (rs146350366, minor allele frequency (MAF) 1.2%, Pmeta=3.2× -8 (CI) 1.2-1.6) for sporadic miscarriage in our European ancestry meta-analysis (50,060 cases and 174,109 controls), located near FGF9 involved in pregnancy maintenance3 and progesterone production4. Additionally, we identified three genome-wide significant associations for recurrent miscarriage, including a signal on chromosome 9 (rs7859844, MAF=6.4%, Pmeta=1.3× -8 in controlling extravillous trophoblast motility5. We further investigate the genetic architecture of miscarriage with biobank-scale Mendelian randomization, heritability and, genetic correlation analyses. Our results implicate that miscarriage etiopathogenesis is partly driven by genetic variation related to gonadotropin regulation, placental biology and progesterone production.
biorxiv genetics 0-100-users 2019Deep learning of representations for transcriptomics-based phenotype prediction, bioRxiv, 2019-03-12
AbstractThe ability to predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. This task is complicated because expression data are high dimensional whereas each experiment is usually small (e.g.,∼20,000 genes may be measured for∼100 subjects). However, thousands of transcriptomics experiments with hundreds of thousands of samples are available in public repositories. Can representation learning techniques leverage these public data to improve predictive performance on other tasks? Here, we report a comprehensive analysis using different gene sets, normalization schemes, and machine learning methods on a set of 24 binary and multiclass prediction problems and 26 survival analysis tasks. Methods that combine large numbers of genes outperformed single gene methods, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that usingl2-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses.
biorxiv bioinformatics 0-100-users 2019