Genome-wide signals of drift and local adaptation during rapid lineage divergence in a songbird, bioRxiv, 2018-01-08
AbstractThe formation of independent evolutionary lineages involves neutral and selective factors, and understanding their relative roles in population divergence is a fundamental goal of speciation research. Correlations between allele frequencies and environmental variability can reveal the role of selection, yet the relative contribution of drift can be difficult to establish. Recently diversified systems such as that of the Oregon junco (Aves Emberizidae) of western North America provide ideal scenarios to apply genetic-environment association analyses (GEA) while controlling for population structure. Genome-wide SNP loci analyses revealed marked genetic structure consisting of differentiated populations in isolated, dry southern mountain ranges, and more admixed recently expanded populations in humid northern latitudes. We used correlations between genomic and environmental variance to test for three specific modes of evolutionary divergence (i) drift in geographic isolation, (ii) differentiation along continuous selective gradients, and (iii) isolation by adaptation. We found evidence of strong drift in southern mountains, but also signals of local adaptation in several populations, driven by temperature, precipitation, elevation and vegetation, especially when controlling for population history. We identified numerous variants under selection scattered across the genome, suggesting that local adaptation can promote rapid differentiation over short periods when acting over multiple independent loci.
biorxiv evolutionary-biology 0-100-users 2018Inferring Species Trees Using Integrative Models of Species Evolution, bioRxiv, 2018-01-08
AbstractBayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.
biorxiv evolutionary-biology 0-100-users 2018Protein-coding variation and introgression of regulatory alleles drive plumage pattern diversity in the rock pigeon, bioRxiv, 2018-01-06
ABSTRACTBirds and other vertebrates display stunning variation in pigmentation patterning, yet the genes controlling this diversity remain largely unknown. Rock pigeons (Columba livia) are fundamentally one of four color pattern phenotypes, in decreasing order of melanism T-check, checker, bar (ancestral), or barless. Using whole-genome scans, we identified NDP as a candidate gene for this variation. Allele-specific expression differences in NDP indicate cis-regulatory differences between ancestral and melanistic alleles. Sequence comparisons suggest that derived alleles originated in the speckled pigeon (Columba guinea), providing a striking example of introgression of alleles that are favored by breeders and are potentially advantageous in the wild. In contrast, barless rock pigeons have an increased incidence of vision defects and, like two human families with hereditary blindness, carry start-codon mutations in NDP. In summary, we find unexpected links between color pattern, introgression, and vision defects associated with regulatory and coding variation at a single locus.
biorxiv evolutionary-biology 0-100-users 2018Carriers of mitochondrial DNA macrohaplogroup L3 basic lineages migrated back to Africa from Asia around 70,000 years ago, bioRxiv, 2017-12-14
ABSTRACTBackgroundAfter three decades of mtDNA studies on human evolution the only incontrovertible main result is the African origin of all extant modern humans. In addition, a southern coastal route has been relentlessly imposed to explain the Eurasian colonization of these African pioneers. Based on the age of macrohaplogroup L3, from which all maternal Eurasian and the majority of African lineages originated, that out-of-Africa event has been dated around 60-70 kya. On the opposite side, we have proposed a northern route through Central Asia across the Levant for that expansion. Consistent with the fossil record, we have dated it around 125 kya. To help bridge differences between the molecular and fossil record ages, in this article we assess the possibility that mtDNA macrohaplogroup L3 matured in Eurasia and returned to Africa as basic L3 lineages around 70 kya.ResultsThe coalescence ages of all Eurasian (M,N) and African L3 lineages, both around 71 kya, are not significantly different. The oldest M and N Eurasian clades are found in southeastern Asia instead near of Africa as expected by the southern route hypothesis. The split of the Y-chromosome composite DE haplogroup is very similar to the age of mtDNA L3. A Eurasian origin and back migration to Africa has been proposed for the African Y-chromosome haplogroup E. Inside Africa, frequency distributions of maternal L3 and paternal E lineages are positively correlated. This correlation is not fully explained by geographic or ethnic affinities. It seems better to be the result of a joint and global replacement of the old autochthonous male and female African lineages by the new Eurasian incomers.ConclusionsThese results are congruent with a model proposing an out-of-Africa of early anatomically modern humans around 125 kya. A return to Africa of Eurasian fully modern humans around 70 kya, and a second Eurasian global expansion by 60 kya. Climatic conditions and the presence of Neanderthals played key roles in these human movements.
biorxiv evolutionary-biology 0-100-users 2017Genetic landscapes reveal how human genetic diversity aligns with geography, bioRxiv, 2017-12-14
Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate effective migration surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are rugged, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.
biorxiv evolutionary-biology 100-200-users 2017Geometry of the sample frequency spectrum and the perils of demographic inference, bioRxiv, 2017-12-14
AbstractThe sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to 0 or diverge to infinity, and show undesirable sensitivity of the inferred demography to perturbations in the data. The goal of this paper is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographic histories and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model with sample size 4, and generalize our intuition to arbitrary sample sizes n using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only κn epochs, where κn is between n2 and 2n – 1. The set of expected SFS for piecewise-constant demographies with fewer than κn epochs is open and non-convex, which causes the above phenomena for inference from data.
biorxiv evolutionary-biology 0-100-users 2017