Insular Celtic population structure and genomic footprints of migration, bioRxiv, 2017-12-09

AbstractPrevious studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.Author summaryA recent genetic study of the UK (People of the British Isles; PoBI) expanded our understanding of population history of the islands, using newly-developed, powerful techniques that harness the rich information embedded in chunks of genetic code called haplotypes. These methods revealed subtle regional diversity across the UK, and, using genetic data alone, timed key migration events into southeast England and Orkney. We have extended these methods to Ireland, identifying regional differences in genetics across the island that adhere to geography at a resolution not previously reported. Our study reveals relative western diversity and eastern homogeneity in Ireland owing to a history of settlement concentrated on the east coast and longstanding Celtic diversity in the west. We show that Irish Celtic diversity enriches the findings of PoBI; haplotypes mirror geography across Britain and Ireland, with relic Celtic populations contributing greatly to haplotypic diversity. Finally, we used genetic information to date migrations into Ireland from Europe and Britain consistent with historical records of Viking and Norman invasions, demonstrating the signatures of these migrations the on modern Irish genome. Our findings demonstrate that genetic structure exists in even small isolated populations, which has important implications for population-based genetic association studies.

biorxiv genetics 100-200-users 2017

Resolving the Full Spectrum of Human Genome Variation using Linked-Reads, bioRxiv, 2017-12-09

AbstractLarge-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.

biorxiv genomics 0-100-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo