Transcriptome-wide association studies opportunities and challenges, bioRxiv, 2017-10-23
Transcriptome-wide association studies (TWAS) integrate GWAS and gene expression datasets to find gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes, using simulations and case studies of literature-curated candidate causal genes for schizophrenia, LDL cholesterol and Crohn’s disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene, as well as loci where TWAS prioritizes multiple genes, some of which are unlikely to be causal, because they share the same variants as eQTLs. We illustrate that TWAS is especially prone to spurious prioritization when using expression data from tissues or cell types that are less related to the trait, due to substantial variation in both expression levels and eQTL strengths across cell types. Nonetheless, TWAS prioritizes candidate causal genes at GWAS loci more accurately than simple baselines based on proximity to lead GWAS variant and expression in trait-related tissue. We discuss current strategies and future opportunities for improving the performance of TWAS for causal gene prioritization. Our results showcase the strengths and limitations of using expression variation across individuals to determine causal genes at GWAS loci and provide guidelines and best practices when using TWAS to prioritize candidate causal genes.
biorxiv genetics 100-200-users 2017Mapping the human brain's cortical-subcortical functional network organization, bioRxiv, 2017-10-20
Understanding complex systems such as the human brain requires characterization of the system's architecture across multiple levels of organization - from neurons, to local circuits, to brain regions, and ultimately large-scale brain networks. Here we focus on characterizing the human brain's large-scale network organization, as it provides an overall framework for the organization of all other levels. We developed a highly principled approach to identify cortical network communities at the level of functional systems, calibrating our community detection algorithm using extremely well-established sensory and motor systems as guides. Building on previous network partitions, we replicated and expanded upon well-known and recently-identified networks, including several higher-order cognitive networks such as a left-lateralized language network. We expanded these cortical networks to subcortex, revealing 358 highly-organized subcortical parcels that take part in forming whole-brain functional networks. Notably, the identified subcortical parcels are similar in number to a recent estimate of the number of cortical parcels (360). This whole-brain network atlas - released as an open resource for the neuroscience community - places all brain structures across both cortex and subcortex into a single large-scale functional framework, with the potential to facilitate a variety of studies investigating large-scale functional networks in health and disease.
biorxiv neuroscience 0-100-users 2017RNA velocity in single cells, bioRxiv, 2017-10-20
AbstractRNA abundance is a powerful indicator of the state of individual cells, but does not directly reveal dynamic processes such as cellular differentiation. Here we show that RNA velocity—the time derivative of RNA abundance—can be estimated by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. We show that RNA velocity is a vector that predicts the future state of individual cells on a timescale of hours. We validate the accuracy of RNA velocity in the neural crest lineage, demonstrate its use on multiple technical platforms, reconstruct the branching lineage tree of the mouse hippocampus, and measure RNA kinetics in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.
biorxiv genomics 100-200-users 2017Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain by scGESTALT, bioRxiv, 2017-10-20
ABSTRACTHundreds of cell types are generated during development, but their lineage relationships are largely elusive. Here we report a technology, scGESTALT, which combines cell type identification by single-cell RNA sequencing with lineage recording by cumulative barcode editing. We sequenced ~60,000 transcriptomes from the juvenile zebrafish brain and identified more than 100 cell types and marker genes. We engineered an inducible system that combines early and late barcode editing and isolated thousands of single-cell transcriptomes and their associated barcodes. The large diversity of edited barcodes and cell types enabled the generation of lineage trees with hundreds of branches. Inspection of lineage trajectories identified restrictions at the level of cell types and brain regions and helped uncover gene expression cascades during differentiation. These results establish scGESTALT as a new and widely applicable tool to simultaneously characterize the molecular identities and lineage histories of thousands of cells during development and disease.
biorxiv developmental-biology 100-200-users 2017Distinguishing genetic correlation from causation across 52 diseases and complex traits, bioRxiv, 2017-10-19
AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline1.gif ><jatsinline-formula>α1α2) and E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline2.gif ><jatsinline-formula>α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large <jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline3.gif ><jatsinline-formula>) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp > 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.
biorxiv genetics 200-500-users 2017Strong binding activity of few transcription factors is a major determinant of open chromatin, bioRxiv, 2017-10-18
AbstractIt is well established that transcription factors (TFs) play crucial roles in determining cell identity, and that a large fraction of all TFs are expressed in most cell types. In order to globally characterize activities of TFs in cells, we have developed a novel massively parallel protein activity assay, Active TF Identification (ATI) that measures DNA-binding activity of all TFs from any species or tissue type. In contrast to previous studies based on mRNA expression or protein abundance, we found that a set of TFs binding to only around ten distinct motifs display strong DNA-binding activity in any given cell or tissue type. Mass spectrometric identification of TFs revealed that within these highly active TFs, there were both housekeeping TFs, which were universally found in all cell types, and specific TFs, which were highly enriched in known factors that determine the fate of the analyzed tissue or cell type. The importance of a small subset of TFs for determining the overall accessible chromatin landscape of a cell suggests that gene regulatory logic may be simpler than what has previously been appreciated.
biorxiv cell-biology 100-200-users 2017