Deciphering eukaryotic cis-regulatory logic with 100 million random promoters, bioRxiv, 2017-11-26
AbstractDeciphering cis-regulation, the code by which transcription factors (TFs) interpret regulatory DNA sequence to control gene expression levels, is a long-standing challenge. Previous studies of native or engineered sequences have remained limited in scale. Here, we use random sequences as an alternative, allowing us to measure the expression output of over 100 million synthetic yeast promoters. Random sequences yield a broad range of reproducible expression levels, indicating that the fortuitous binding sites in random DNA are functional. From these data we learn models of transcriptional regulation that predict over 94% of the expression driven from independent test data and nearly 89% from sequences from yeast promoters. These models allow us to characterize the activity of TFs and their interactions with chromatin, and help refine cis-regulatory motifs. We find that strand, position, and helical face preferences of TFs are widespread and depend on interactions with neighboring chromatin. Such massive-throughput regulatory assays of random DNA provide the diverse examples necessary to learn complex models of cis-regulatory logic.
biorxiv genomics 200-500-users 2017The nature of nurture effects of parental genotypes, bioRxiv, 2017-11-15
AbstractSequence variants in the parental genomes that are not transmitted to a childproband are often ignored in genetic studies. Here we show that non-transmitted alleles can impact a child through their effects on the parents and other relatives, a phenomenon we call genetic nurture. Using results from a meta-analysis of educational attainment, the polygenic score computed for the non-transmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (P = 1.6×10−14) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutritionheath related traits.One Sentence SummaryNurture has a genetic component, i.e. alleles in the parents affect the parents’ phenotypes and through that influence the outcomes of the child.
biorxiv genetics 200-500-users 2017Gut microbiota has a widespread and modifiable effect on host gene regulation, bioRxiv, 2017-10-28
AbstractVariation in gut microbiome is associated with wellness and disease in humans, yet the molecular mechanisms by which this variation affects the host are not well understood. A likely mechanism is through changing gene regulation in interfacing host epithelial cells. Here, we treated colonic epithelial cells with live microbiota from five healthy individuals and quantified induced changes in transcriptional regulation and chromatin accessibility in host cells. We identified over 5,000 host genes that change expression, including 588 distinct associations between specific taxa and host genes. The taxa with the strongest influence on gene expression alter the response of genes associated with complex traits. Using ATAC-seq, we show that a subset of these changes in gene expression are likely the result of changes in host chromatin accessibility and transcription factor binding induced by exposure to gut microbiota. We then created a manipulated microbial community with titrated doses of Collinsella, demonstrating that both natural and controlled microbiome composition leads to distinct, and predictable, gene expression profiles in host cells. Together, our results suggest that specific microbes play an important role in regulating expression of individual host genes involved in human complex traits. The ability to fine tune the expression of host genes by manipulating the microbiome suggests future therapeutic routes.
biorxiv genomics 200-500-users 2017Adaptive evolution within the gut microbiome of individual people, bioRxiv, 2017-10-25
AbstractIndividual bacterial lineages stably persist for years in the human gut microbiome1–3. However, the potential of these lineages to adapt during colonization of healthy people is not well understood2,4. Here, we assess evolution within individual microbiomes by sequencing the genomes of 602 Bacteroides fragilis isolates cultured from 12 healthy subjects. We find that B. fragilis within-subject populations contain substantial de novo nucleotide and mobile element diversity, which preserve years of within-person evolutionary history. This evolutionary history contains signatures of within-person adaptation to both subject-specific and common selective forces, including parallel mutations in sixteen genes. These sixteen genes are involved in cell-envelope biosynthesis and polysaccharide utilization, as well as yet under-characterized pathways. Notably, one of these genes has been shown to be critical for B. fragilis colonization in mice5, indicating that key genes have not already been optimized for survival in vivo. This lack of optimization, given historical signatures of purifying selection in these genes, suggests that varying selective forces with discordant solutions act upon B. fragilis in vivo. Remarkably, in one subject, two B. fragilis sublineages coexisted at a stable relative frequency over a 1.5-year period despite rapid adaptive dynamics within one of the sublineages. This stable coexistence suggests that competing selective forces can lead to B. fragilis niche-differentiation even within a single person. We conclude that B. fragilis adapts rapidly within the microbiomes of individual healthy people, providing a new route for the discovery of key genes in the microbiome and implications for microbiome stability and manipulation.
biorxiv evolutionary-biology 200-500-users 2017Distinguishing genetic correlation from causation across 52 diseases and complex traits, bioRxiv, 2017-10-19
AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline1.gif ><jatsinline-formula>α1α2) and E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline2.gif ><jatsinline-formula>α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large <jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline3.gif ><jatsinline-formula>) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp > 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.
biorxiv genetics 200-500-users 2017High-resolution genome-wide functional dissection of transcriptional regulatory regions in human, bioRxiv, 2017-09-28
AbstractGenome-wide epigenomic maps revealed millions of regions showing signatures of enhancers, promoters, and other gene-regulatory elements1. However, high-throughput experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited in their scale and length of regions tested. Here, we present a new method, HiDRA (High-Definition Reporter Assay), that overcomes these limitations by combining components of Sharpr-MPRA2 and STARR-Seq3 with genome-wide selection of accessible regions from ATAC-Seq4. We used HiDRA to test ~7 million DNA fragments preferentially selected from accessible chromatin in the GM12878 lymphoblastoid cell line. By design, accessibility-selected fragments were highly overlapping (up to 370 per region), enabling us to pinpoint driver regulatory nucleotides by exploiting subtle differences in reporter activity between partially-overlapping fragments, using a new machine learning model SHARPR2. Our resulting maps include ~65,000 regions showing significant enhancer function and enriched for endogenous active histone marks (including H3K9ac, H3K27ac), regulatory sequence motifs, and regions bound by immune regulators. Within them, we discover ~13,000 high-resolution driver elements enriched for regulatory motifs and evolutionarily-conservednucleotides, and help predict causal genetic variants underlying disease from genome-wide association studies. Overall, HiDRA provides a general, scalable, high-throughput, and high-resolution approach for experimental dissection of regulatory regions and driver nucleotides in the context of human biology and disease.
biorxiv genomics 200-500-users 2017