Detecting polygenic adaptation in admixture graphs, bioRxiv, 2017-06-06
AbstractAn open question in human evolution is the importance of polygenic adaptation adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method - which we call PolyGraph - has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.
biorxiv evolutionary-biology 100-200-users 2017Distinct neuronal activity patterns induce different gene expression programs, bioRxiv, 2017-06-06
SUMMARYBrief and sustained neuronal activity patterns can have opposite effects on synaptic strength that both require activity-regulated gene (ARG) expression. However, whether distinct patterns of activity induce different sets of ARGs is unknown. In genome-scale experiments, we reveal that a neuron’s activity-pattern history can be predicted from the ARGs it expresses. Surprisingly, brief activity selectively induces a small subset of the ARG program that that corresponds precisely to the first of three temporal waves of genes induced by sustained activity. These first-wave genes are distinguished by an open chromatin state, proximity to rapidly activated enhancers, and a requirement for MAPKERK signaling for their induction. MAPKERK mediates rapid RNA polymerase recruitment to promoters, as well as enhancer RNA induction but not histone acetylation at enhancers. Thus, the same mechanisms that establish the multi-wave temporal structure of ARG induction also enable different sets of genes to be induced by distinct activity patterns.
biorxiv neuroscience 0-100-users 2017Discovery of the first genome-wide significant risk loci for ADHD, bioRxiv, 2017-06-04
AbstractAttention-DeficitHyperactivity Disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of school-age children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no individual variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 ADHD cases and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, revealing new and important information on the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes, as well as around brain-expressed regulatory marks. These findings, based on clinical interviews andor medical records are supported by additional analyses of a self-reported ADHD sample and a study of quantitative measures of ADHD symptoms in the population. Meta-analyzing these data with our primary scan yielded a total of 16 genome-wide significant loci. The results support the hypothesis that clinical diagnosis of ADHD is an extreme expression of one or more continuous heritable traits.
biorxiv genetics 200-500-users 2017Improving the value of public RNA-seq expression data by phenotype prediction, bioRxiv, 2017-06-04
Abstract<jatssec id=sa1>BackgroundPublicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions.<jatssec id=sa2>ResultsWe develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using, well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70,000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsjhubiostatistics.shinyapps.iorecount>httpsjhubiostatistics.shinyapps.iorecount<jatsext-link>). We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comleekgroupphenopredict>httpsgithub.comleekgroupphenopredict<jatsext-link>) and the predictions for recount2 are available from the recount R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbioconductor.orgpackagesreleasebiochtmlrecount.html>httpsbioconductor.orgpackagesreleasebiochtmlrecount.html<jatsext-link>)<jatssec id=sa3>ConclusionHaving leveraging massive public data sets to generate a well-phenotyped set of expression data for more than 70,000 human samples, expression data is available for use on a scale that was not previously feasible.
biorxiv bioinformatics 100-200-users 2017Genetic identification Of brain cell types underlying schizophrenia, bioRxiv, 2017-06-03
AbstractWith few exceptions, the marked advances in knowledge about the genetic basis for schizophrenia have not converged on findings that can be confidently used for precise experimental modeling. Applying knowledge of the cellular taxonomy of the brain from single-cell RNA-sequencing, we evaluated whether the genomic loci implicated in schizophrenia map onto specific brain cell types. The common variant genomic results consistently mapped to pyramidal cells, medium spiny neurons, and certain interneurons but far less consistently to embryonic, progenitor, or glial cells. These enrichments were due to distinct sets of genes specifically expressed in each of these cell types. Many of the diverse gene sets associated with schizophrenia (including antipsychotic targets) implicate the same brain cell types. Our results provide a parsimonious explanation the common-variant genetic results for schizophrenia point at a limited set of neurons, and the gene sets point to the same cells. While some of the genetic risk is associated with GABAergic interneurons, this risk largely does not overlap with that from projecting cells.
biorxiv genomics 0-100-users 2017SCENIC Single-cell regulatory network inference and clustering, bioRxiv, 2017-06-01
AbstractSingle-cell RNA-seq allows building cell atlases of any given tissue and infer the dynamics of cellular state transitions during developmental or disease trajectories. Both the maintenance and transitions of cell states are encoded by regulatory programs in the genome sequence. However, this regulatory code has not yet been exploited to guide the identification of cellular states from single-cell RNA-seq data. Here we describe a computational resource, called SCENIC (Single Cell rEgulatory Network Inference and Clustering), for the simultaneous reconstruction of gene regulatory networks (GRNs) and the identification of stable cell states, using single-cell RNA-seq data. SCENIC outperforms existing approaches at the level of cell clustering and transcription factor identification. Importantly, we show that cell state identification based on GRNs is robust towards batch-effects and technical-biases. We applied SCENIC to a compendium of single-cell data from the mouse and human brain and demonstrate that the proper combinations of transcription factors, target genes, enhancers, and cell types can be identified. Moreover, we used SCENIC to map the cell state landscape in melanoma and identified a gene regulatory network underlying a proliferative melanoma state driven by MITF and STAT and a contrasting network controlling an invasive state governed by NFATC2 and NFIB. We further validated these predictions by showing that two transcription factors are predominantly expressed in early metastatic sentinel lymph nodes. In summary, SCENIC is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach. SCENIC is generic, easy to use, and flexible, and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs. Availability SCENIC is available as an R workflow based on three new RBioconductor packages GENIE3, RcisTarget and AUCell. As scalable alternative to GENIE3, we also provide GRNboost, paving the way towards the network analysis across millions of single cells.
biorxiv bioinformatics 0-100-users 2017