The user’s guide to comparative genomics with EnteroBase. Three case studies micro-clades within Salmonella enterica serovar Agama, ancient and modern populations of Yersinia pestis, and core genomic diversity of all Escherichia, bioRxiv, 2019-04-19

AbstractEnteroBase is an integrated software environment which supports the identification of global population structures within several bacterial genera including pathogens. It currently contains more than 300,000 genomes that have been assembled from Illumina short reads from the genera Salmonella, Escherichia, Yersinia, Clostridiodes, Helicobacter, Vibrio, and Moraxella. With the recent introduction of hierarchical clustering of core genome MLST sequence types, EnteroBase now facilitates the identification of close relatives of bacteria within those genera inside of a few hours of uploading their short reads. It also supports private collaborations between groups of users, and the comparison of genomic data that were assembled from short reads with SNP calls that were extracted from metagenomic sequences. Here we provide an overview for its users on how EnteroBase works, what it can do, and its future prospects. This user’s guide is illustrated by three case studies ranging in scale from the miniscule (local transmission of Salmonella between neighboring social groups of badgers) through pandemic transmission of plague and microevolution of Yersinia pestis over the last 5,000 years to a novel, global overview of the population structure of all of Escherichia.

biorxiv microbiology 100-200-users 2019

Extensive impact of low-frequency variants on the phenotypic landscape at population-scale, bioRxiv, 2019-04-16

AbstractGenome-wide association studies (GWAS) allows to dissect the genetic basis of complex traits at the population level1. However, despite the extensive number of trait-associated loci found, they often fail to explain a large part of the observed phenotypic variance2–4. One potential source of this discrepancy could be the preponderance of undetected low-frequency genetic variants in natural populations5,6. To increase the allele frequency of those variants and assess their phenotypic effects at the population level, we generated a diallel panel consisting of 3,025 hybrids, derived from pairwise crosses between a subset of natural isolates from a completely sequenced 1,011 Saccharomyces cerevisiae population. We examined each hybrid across a large number of growth traits, resulting in a total of 148,225 crosstrait combinations. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a significant proportion (29%) is governed by non-additive effects. This is confirmed by the fact that a majority of complete dominance is observed in 25% of the traits. By performing GWAS on the diallel panel, we detected 1,723 significantly associated genetic variants, with 16.3% of them being low-frequency variants in the initial population. These variants, which would not be detected using classical GWAS, explain 21% of the phenotypic variance on average. Altogether, our results demonstrate that low-frequency variants should be accounted for as they contribute to a large part of the phenotypic variation observed in a population.

biorxiv genomics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo