Identification of hidden population structure in time-scaled phylogenies, bioRxiv, 2019-07-16
AbstractPopulation structure influences genealogical patterns, however data pertaining to how populations are structured are often unavailable or not directly observable. Inference of population structure is highly important in molecular epidemiology where pathogen phylogenetics is increasingly used to infer transmission patterns and detect outbreaks. Discrepancies between observed and idealised genealogies, such as those generated by the coalescent process, can be quantified, and where significant differences occur, may reveal the action of natural selection, host population structure, or other demographic and epidemiological heterogeneities. We have developed a fast non-parametric statistical test for detection of cryptic population structure in time-scaled phylogenetic trees. The test is based on contrasting estimated phylogenies with the theoretically expected phylodynamic ordering of common ancestors in two clades within a coalescent framework. These statistical tests have also motivated the development of algorithms which can be used to quickly screen a phylogenetic tree for clades which are likely to share a distinct demographic or epidemiological history. Epidemiological applications include identification of outbreaks in vulnerable host populations or rapid expansion of genotypes with a fitness advantage. To demonstrate the utility of these methods for outbreak detection, we applied the new methods to large phylogenies reconstructed from thousands of HIV-1 partial pol sequences. This revealed the presence of clades which had grown rapidly in the recent past, and was significantly concentrated in young men, suggesting recent and rapid transmission in that group. Furthermore, to demonstrate the utility of these methods for the study of antimicrobial resistance, we applied the new methods to a large phylogeny reconstructed from whole genome Neisseria gonorrhoeae sequences. We find that population structure detected using these methods closely overlaps with the appearance and expansion of mutations conferring antimicrobial resistance.
biorxiv evolutionary-biology 100-200-users 2019Migratory divides coincide with species barriers across replicated avian hybrid zones above the Tibetan Plateau, bioRxiv, 2019-07-12
AbstractMigratory divides are proposed to be catalysts for speciation across a diversity of taxa. However, the relative contribution of migratory behavior to reproductive isolation is difficult to test. Comparing reproductive isolation in hybrid zones with and without migratory divides offers a rare opportunity to directly examine the contribution of divergent migratory behavior to reproductive barriers. We show that across replicate sampling transects of two pairs of barn swallow (Hirundo rustica) subspecies, strong reproductive isolation coincided with an apparent migratory divide spanning 20 degrees of latitude. A third subspecies pair exhibited no evidence for a migratory divide and hybridized extensively. Within migratory divides, migratory phenotype was associated with assortative mating, implicating a central contribution of divergent migratory behavior to reproductive barriers. The remarkable geographic coincidence between migratory divides and genetic breaks supports a longstanding hypothesis that the Tibetan Plateau is a substantial barrier contributing to the diversity of Siberian avifauna.
biorxiv evolutionary-biology 100-200-users 2019VolcanoFinder genomic scans for adaptive introgression, bioRxiv, 2019-07-12
AbstractRecent research shows that introgression between closely-related species is an important source of adaptive alleles for a wide range of taxa. Typically, detection of adaptive introgression from genomic data relies on comparative analyses that require sequence data from both the recipient and the donor species. However, in many cases, the donor is unknown or the data is not currently available. Here, we introduce a genome-scan method—VolcanoFinder—to detect recent events of adaptive introgression using polymorphism data from the recipient species only.VolcanoFinder detects adaptive introgression sweeps from the pattern of excess intermediate-frequency polymorphism they produce in the flanking region of the genome, a pattern which appears as a volcano-shape in pairwise genetic diversity.Using coalescent theory, we derive analytical predictions for these patterns. Based on these results, we develop a composite-likelihood test to detect signatures of adaptive introgression relative to the genomic background. Simulation results show that VolcanoFinder has high statistical power to detect these signatures, even for older sweeps and for soft sweeps initiated by multiple migrant haplotypes. Finally, we implement VolcanoFinder to detect archaic introgression in European and sub-Saharan African human populations, and uncovered interesting candidates in both populations, such as TSHR in Europeans and TCHH-RPTN in Africans. We discuss their biological implications and provide guidelines for identifying and circumventing artifactual signals during empirical applications of VolcanoFinder.Author summaryThe process by which beneficial alleles are introduced into a species from a closely-related species is termed adaptive introgression. We present an analytically-tractable model for the effects of adaptive introgression on non-adaptive genetic variation in the genomic region surrounding the beneficial allele. The result we describe is a characteristic volcano-shaped pattern of increased variability that arises around the positively-selected site, and we introduce an open-source method VolcanoFinder to detect this signal in genomic data. Importantly, VolcanoFinder is a population-genetic likelihood-based approach, rather than a comparative-genomic approach, and can therefore probe genomic variation data from a single population for footprints of adaptive introgression, even from a priori unknown and possibly extinct donor species.
biorxiv evolutionary-biology 100-200-users 2019A near-full-length HIV-1 genome from 1966 recovered from formalin-fixed paraffin-embedded tissue, bioRxiv, 2019-07-01
AbstractAlthough estimated to have emerged in humans in Central Africa in the early 1900s, HIV-1, the main causative agent of AIDS, was only discovered in 1983. With very little direct biological data of HIV-1 from before the 1980s, far-reaching evolutionary and epidemiological inferences regarding the long pre-discovery phase of this pandemic are based on extrapolations by phylodynamic models of HIV-1 genomic sequences gathered mostly over recent decades. Here, using a very sensitive multiplex RT-PCR assay, we screened 1,652 formalin-fixed paraffin-embedded tissue specimens collected for pathology diagnostics in Kinshasa, Democratic Republic of Congo (DRC), between 1959 and 1967. We report the near-complete genome of one positive from 1966 (“DRC66”)—a non-recombinant sister lineage to subtype C that constitutes the oldest HIV-1 near-full-length genome recovered to date. Root-to-tip plots showed the DRC66 sequence is not an outlier as would be expected if dating estimates from more recent genomes were systematically biased; and inclusion of DRC66 sequence in tip-dated BEAST analyses did not significantly alter root and internal node age estimates based on post-1978 HIV-1 sequences. There was larger variation in divergence time estimates among datasets that were subsamples of the available HIV-1 genomes from 1978-2015, showing the inherent phylogenetic stochasticity across subsets of the real HIV-1 diversity. In conclusion, this unique archival HIV-1 sequence provides direct genomic insight into HIV-1 in 1960s DRC, and, as an ancient-DNA calibrator, it validates our understanding of HIV-1 evolutionary history.SignificanceInferring the precise timing of the origin of the HIVAIDS pandemic is of great importance because it offers insights into which factors did—or did not—facilitate the emergence of the causal virus. Previous estimates have implicated rapid development during the early 20th century in Central Africa, which wove once-isolated populations into a more continuous fabric. We recovered the first HIV-1 genome from the 1960s, and it provides direct evidence that HIV-1 molecular clock estimates spanning the last half-century are remarkably reliable. And, because this genome itself was sampled only about a half-century after the estimated origin of the pandemic, it empirically anchors this crucial inference with high confidence.
biorxiv evolutionary-biology 200-500-users 2019Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph, bioRxiv, 2019-07-01
AbstractThe sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present an extended version of the ARGweaver algorithm, ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topology and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. We also identify 1% of the Denisovan genome which was likely introgressed from an unsequenced hominin ancestor, and note that 15% of these regions have been passed on to modern humans through subsequent gene flow.
biorxiv evolutionary-biology 200-500-users 2019Direct evidence for transport of RNA from the mouse brain to the germline and offspring, bioRxiv, 2019-06-29
AbstractThe traditional concept that heritability occurs exclusively from the transfer of germline-restricted genetics is being challenged by the increasing accumulation of evidence confirming the existence of experience-dependent transgenerational inheritance. Transgenerational inheritance is emerging as a powerful mechanism for robustly transmitting phenotypic adaptations to offspring. However, questions remain unanswered as to how this heritable information is passed from somatic cells. Previous studies have implicated the critical involvement of RNA in heritable transgenerational effects and the high degree of mobility and genomic impact of RNAs in all organisms is an attractive model for the efficient transfer of genetic information. Here we show, for the first time, robust transport of RNA from the brain of an adult male mouse to sperm, and subsequently to offspring. Our observation of heritable genetic information originating from a somatic tissue may reveal a mechanism for how transgenerational effects are transmitted to offspring.
biorxiv evolutionary-biology 0-100-users 2019