Insular Celtic population structure and genomic footprints of migration, bioRxiv, 2017-12-09
AbstractPrevious studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.Author summaryA recent genetic study of the UK (People of the British Isles; PoBI) expanded our understanding of population history of the islands, using newly-developed, powerful techniques that harness the rich information embedded in chunks of genetic code called haplotypes. These methods revealed subtle regional diversity across the UK, and, using genetic data alone, timed key migration events into southeast England and Orkney. We have extended these methods to Ireland, identifying regional differences in genetics across the island that adhere to geography at a resolution not previously reported. Our study reveals relative western diversity and eastern homogeneity in Ireland owing to a history of settlement concentrated on the east coast and longstanding Celtic diversity in the west. We show that Irish Celtic diversity enriches the findings of PoBI; haplotypes mirror geography across Britain and Ireland, with relic Celtic populations contributing greatly to haplotypic diversity. Finally, we used genetic information to date migrations into Ireland from Europe and Britain consistent with historical records of Viking and Norman invasions, demonstrating the signatures of these migrations the on modern Irish genome. Our findings demonstrate that genetic structure exists in even small isolated populations, which has important implications for population-based genetic association studies.
biorxiv genetics 100-200-users 2017Common risk variants identified in autism spectrum disorder, bioRxiv, 2017-11-26
AbstractAutism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 ASD cases and 27,969 controls that identifies five genome-wide significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), seven additional loci shared with other traits are identified at equally strict significance levels. Dissecting the polygenic architecture we find both quantitative and qualitative polygenic heterogeneity across ASD subtypes, in contrast to what is typically seen in other complex disorders. These results highlight biological insights, particularly relating to neuronal function and corticogenesis and establish that GWAS performed at scale will be much more productive in the near term in ASD, just as it has been in a broad range of important psychiatric and diverse medical phenotypes.
biorxiv genetics 200-500-users 2017Dynamics of the upper airway microbiome in the pathogenesis of asthma-associated persistent wheeze in preschool children, bioRxiv, 2017-11-21
ABSTRACTRepeated cycles of infection-associated lower airway inflammation drives the pathogenesis of persistent wheezing disease in children. Tracking these events across a birth cohort during their first five years, we demonstrate that >80% of infectious events indeed involve viral pathogens, but are accompanied by a shift in the nasopharyngeal microbiome (NPM) towards dominance by a small range of pathogenic bacterial genera. Unexpectedly, this change in NPM frequently precedes the appearance of viral pathogens and acute symptoms. In non-sensitized children these events are associated only with “transient wheeze” that resolves after age three. In contrast, in children developing early allergic sensitization, they are associated with ensuing development of persistent wheeze, which is the hallmark of the asthma phenotype. This suggests underlying pathogenic interactions between allergic sensitization and antibacterial mechanisms.
biorxiv genetics 0-100-users 2017The nature of nurture effects of parental genotypes, bioRxiv, 2017-11-15
AbstractSequence variants in the parental genomes that are not transmitted to a childproband are often ignored in genetic studies. Here we show that non-transmitted alleles can impact a child through their effects on the parents and other relatives, a phenomenon we call genetic nurture. Using results from a meta-analysis of educational attainment, the polygenic score computed for the non-transmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (P = 1.6×10−14) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutritionheath related traits.One Sentence SummaryNurture has a genetic component, i.e. alleles in the parents affect the parents’ phenotypes and through that influence the outcomes of the child.
biorxiv genetics 200-500-users 2017Transcriptome-wide association studies opportunities and challenges, bioRxiv, 2017-10-23
Transcriptome-wide association studies (TWAS) integrate GWAS and gene expression datasets to find gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes, using simulations and case studies of literature-curated candidate causal genes for schizophrenia, LDL cholesterol and Crohn’s disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene, as well as loci where TWAS prioritizes multiple genes, some of which are unlikely to be causal, because they share the same variants as eQTLs. We illustrate that TWAS is especially prone to spurious prioritization when using expression data from tissues or cell types that are less related to the trait, due to substantial variation in both expression levels and eQTL strengths across cell types. Nonetheless, TWAS prioritizes candidate causal genes at GWAS loci more accurately than simple baselines based on proximity to lead GWAS variant and expression in trait-related tissue. We discuss current strategies and future opportunities for improving the performance of TWAS for causal gene prioritization. Our results showcase the strengths and limitations of using expression variation across individuals to determine causal genes at GWAS loci and provide guidelines and best practices when using TWAS to prioritize candidate causal genes.
biorxiv genetics 100-200-users 2017Distinguishing genetic correlation from causation across 52 diseases and complex traits, bioRxiv, 2017-10-19
AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline1.gif ><jatsinline-formula>α1α2) and E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline2.gif ><jatsinline-formula>α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large <jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline3.gif ><jatsinline-formula>) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp > 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.
biorxiv genetics 200-500-users 2017