Detecting polygenic adaptation in admixture graphs, bioRxiv, 2017-06-06

AbstractAn open question in human evolution is the importance of polygenic adaptation adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method - which we call PolyGraph - has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.

biorxiv evolutionary-biology 100-200-users 2017

Improving the value of public RNA-seq expression data by phenotype prediction, bioRxiv, 2017-06-04

Abstract<jatssec id=sa1>BackgroundPublicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions.<jatssec id=sa2>ResultsWe develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using, well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70,000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsjhubiostatistics.shinyapps.iorecount>httpsjhubiostatistics.shinyapps.iorecount<jatsext-link>). We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comleekgroupphenopredict>httpsgithub.comleekgroupphenopredict<jatsext-link>) and the predictions for recount2 are available from the recount R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbioconductor.orgpackagesreleasebiochtmlrecount.html>httpsbioconductor.orgpackagesreleasebiochtmlrecount.html<jatsext-link>)<jatssec id=sa3>ConclusionHaving leveraging massive public data sets to generate a well-phenotyped set of expression data for more than 70,000 human samples, expression data is available for use on a scale that was not previously feasible.

biorxiv bioinformatics 100-200-users 2017

SCENIC Single-cell regulatory network inference and clustering, bioRxiv, 2017-06-01

AbstractSingle-cell RNA-seq allows building cell atlases of any given tissue and infer the dynamics of cellular state transitions during developmental or disease trajectories. Both the maintenance and transitions of cell states are encoded by regulatory programs in the genome sequence. However, this regulatory code has not yet been exploited to guide the identification of cellular states from single-cell RNA-seq data. Here we describe a computational resource, called SCENIC (Single Cell rEgulatory Network Inference and Clustering), for the simultaneous reconstruction of gene regulatory networks (GRNs) and the identification of stable cell states, using single-cell RNA-seq data. SCENIC outperforms existing approaches at the level of cell clustering and transcription factor identification. Importantly, we show that cell state identification based on GRNs is robust towards batch-effects and technical-biases. We applied SCENIC to a compendium of single-cell data from the mouse and human brain and demonstrate that the proper combinations of transcription factors, target genes, enhancers, and cell types can be identified. Moreover, we used SCENIC to map the cell state landscape in melanoma and identified a gene regulatory network underlying a proliferative melanoma state driven by MITF and STAT and a contrasting network controlling an invasive state governed by NFATC2 and NFIB. We further validated these predictions by showing that two transcription factors are predominantly expressed in early metastatic sentinel lymph nodes. In summary, SCENIC is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach. SCENIC is generic, easy to use, and flexible, and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs. Availability SCENIC is available as an R workflow based on three new RBioconductor packages GENIE3, RcisTarget and AUCell. As scalable alternative to GENIE3, we also provide GRNboost, paving the way towards the network analysis across millions of single cells.

biorxiv bioinformatics 0-100-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo