Insular Celtic population structure and genomic footprints of migration, bioRxiv, 2017-12-09

AbstractPrevious studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.Author summaryA recent genetic study of the UK (People of the British Isles; PoBI) expanded our understanding of population history of the islands, using newly-developed, powerful techniques that harness the rich information embedded in chunks of genetic code called haplotypes. These methods revealed subtle regional diversity across the UK, and, using genetic data alone, timed key migration events into southeast England and Orkney. We have extended these methods to Ireland, identifying regional differences in genetics across the island that adhere to geography at a resolution not previously reported. Our study reveals relative western diversity and eastern homogeneity in Ireland owing to a history of settlement concentrated on the east coast and longstanding Celtic diversity in the west. We show that Irish Celtic diversity enriches the findings of PoBI; haplotypes mirror geography across Britain and Ireland, with relic Celtic populations contributing greatly to haplotypic diversity. Finally, we used genetic information to date migrations into Ireland from Europe and Britain consistent with historical records of Viking and Norman invasions, demonstrating the signatures of these migrations the on modern Irish genome. Our findings demonstrate that genetic structure exists in even small isolated populations, which has important implications for population-based genetic association studies.

biorxiv genetics 100-200-users 2017

Transcriptome-wide association studies opportunities and challenges, bioRxiv, 2017-10-23

Transcriptome-wide association studies (TWAS) integrate GWAS and gene expression datasets to find gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes, using simulations and case studies of literature-curated candidate causal genes for schizophrenia, LDL cholesterol and Crohn’s disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene, as well as loci where TWAS prioritizes multiple genes, some of which are unlikely to be causal, because they share the same variants as eQTLs. We illustrate that TWAS is especially prone to spurious prioritization when using expression data from tissues or cell types that are less related to the trait, due to substantial variation in both expression levels and eQTL strengths across cell types. Nonetheless, TWAS prioritizes candidate causal genes at GWAS loci more accurately than simple baselines based on proximity to lead GWAS variant and expression in trait-related tissue. We discuss current strategies and future opportunities for improving the performance of TWAS for causal gene prioritization. Our results showcase the strengths and limitations of using expression variation across individuals to determine causal genes at GWAS loci and provide guidelines and best practices when using TWAS to prioritize candidate causal genes.

biorxiv genetics 100-200-users 2017

Distinguishing genetic correlation from causation across 52 diseases and complex traits, bioRxiv, 2017-10-19

AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline1.gif ><jatsinline-formula>α1α2) and E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline2.gif ><jatsinline-formula>α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large <jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline3.gif ><jatsinline-formula>) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp &gt; 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.

biorxiv genetics 200-500-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo