Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming, bioRxiv, 2017-09-28
AbstractUnderstanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neuraland placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.
biorxiv bioinformatics 200-500-users 2017Modified penetrance of coding variants by cis-regulatory variation shapes human traits, bioRxiv, 2017-09-19
SummaryCoding variants represent many of the strongest associations between genotype and phenotype, however they exhibit inter-individual differences in effect, known as variable penetrance. In this work, we study how cis-regulatory variation modifies the penetrance of coding variants in their target gene. Using functional genomic and genetic data from GTEx, we observed that in the general population, purifying selection has depleted haplotype combinations that lead to higher penetrance of pathogenic coding variants. Conversely, in cancer and autism patients, we observed an enrichment of haplotype combinations that lead to higher penetrance of pathogenic coding variants in disease implicated genes, which provides direct evidence that regulatory haplotype configuration of causal coding variants affects disease risk. Finally, we experimentally demonstrated that a regulatory variant can modify the penetrance of a coding variant by introducing a Mendelian SNP using CRISPRCas9 on distinct expression haplotypes and using the transcriptome as a phenotypic readout. Our results demonstrate that joint effects of regulatory and coding variants are an important part of the genetic architecture of human traits, and contribute to modified penetrance of disease-causing variants.
biorxiv genetics 200-500-users 2017The PAGE Study How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits, bioRxiv, 2017-09-16
SummaryAbstractGenome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of the role of human variation in disease, and hinders the equitable translation of genetic associations into clinical and public health applications. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using strategies designed for analysis of multi-ethnic and admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 38 secondary signals in known loci and 27 novel loci. Our data shows strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts, and insights into clinical implications. We strongly advocate for continued, large genome-wide efforts in diverse populations to reduce health disparities.
biorxiv genetics 200-500-users 2017Global determinants of navigation ability, bioRxiv, 2017-09-15
SummaryCountries vary in their geographical and cultural properties. Only a few studies have explored how such variations influence how humans navigate or reason about space [1–7]. We predicted that these variations impact human cognition, resulting in an organized spatial distribution of cognition at a planetary-wide scale. To test this hypothesis we developed a mobile-app-based cognitive task, measuring non-verbal spatial navigation ability in more than 2.5 million people, sampling populations in every nation state. We focused on spatial navigation due to its universal requirement across cultures. Using a clustering approach, we find that navigation ability is clustered into five distinct, yet geographically related, groups of countries. Specifically, the economic wealth of a nation was predictive of the average navigation ability of its inhabitants, and gender inequality was predictive of the size of performance difference between males and females. Thus, cognitive abilities, at least for spatial navigation, are clustered according to economic wealth and gender inequalities globally, which has significant implications for cross-cultural studies and multi-centre clinical trials using cognitive testing.
biorxiv neuroscience 200-500-users 2017Massive Mining of Publicly Available RNA-seq Data from Human and Mouse, bioRxiv, 2017-09-15
RNA-sequencing (RNA-seq) is currently the leading technology for genome-wide transcript quantification. While the volume of RNA-seq data is rapidly increasing, the currently publicly available RNA-seq data is provided mostly in raw form, with small portions processed non- uniformly. This is mainly because the computational demand, particularly for the alignment step, is a significant barrier for global and integrative retrospective analyses. To address this challenge, we developed all RNA-seq and ChIP-seq sample and signature search (ARCHS4), a web resource that makes the majority of previously published RNA-seq data from human and mouse freely available at the gene count level. Such uniformly processed data enables easy integration for downstream analyses. For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. ARCHS4 is updated automatically by adding newly published samples to the database as they become available. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene based on prior knowledge combined with co-expression. Benchmarking the quality of these predictions, co-expression correlation data created from ARCHS4 outperforms co-expression data created from other major gene expression data repositories such as GTEx and CCLE.ARCHS4 is freely accessible at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpamp.pharm.mssm.eduarchs4>httpamp.pharm.mssm.eduarchs4<jatsext-link>
biorxiv bioinformatics 200-500-users 2017GWAS meta-analysis (N=279,930) identifies new genes and functional links to intelligence, bioRxiv, 2017-09-07
Intelligence is highly heritable1 and a major determinant of human health and well-being2. Recent genome-wide meta-analyses have identified 24 genomic loci linked to intelligence3–7, but much about its genetic underpinnings remains to be discovered. Here, we present the largest genetic association study of intelligence to date (N=279,930), identifying 206 genomic loci (191 novel) and implicating 1,041 genes (963 novel) via positional mapping, expression quantitative trait locus (eQTL) mapping, chromatin interaction mapping, and gene-based association analysis. We find enrichment of genetic effects in conserved and coding regions and identify 89 nonsynonymous exonic variants. Associated genes are strongly expressed in the brain and specifically in striatal medium spiny neurons and cortical and hippocampal pyramidal neurons. Gene-set analyses implicate pathways related to neurogenesis, neuron differentiation and synaptic structure. We confirm previous strong genetic correlations with several neuropsychiatric disorders, and Mendelian Randomization results suggest protective effects of intelligence for Alzheimer’s dementia and ADHD, and bidirectional causation with strong pleiotropy for schizophrenia. These results are a major step forward in understanding the neurobiology of intelligence as well as genetically associated neuropsychiatric traits.
biorxiv genetics 200-500-users 2017