GWAS of brain volume on 54,407 individuals and cross-trait analysis with intelligence identifies shared genomic loci and genes, bioRxiv, 2019-04-19
AbstractThe phenotypic correlation between human intelligence and brain volume (BV) is considerable (r≈0.40), and has been shown to be due to shared genetic factors1. To further examine specific genetic factors driving this correlation, we present genomic analyses of the genetic overlap between intelligence and BV using genome-wide association study (GWAS) results. First, we conducted the largest BV GWAS meta-analysis to date (N=54,407 individuals), followed by functional annotation and gene-mapping. We identified 35 genomic loci (27 novel), implicating 362 genes (346 novel) and 23 biological pathways for BV. Second, we used an existing GWAS for intelligence (N=269,867 individuals2), and estimated the genetic correlation (rg) between BV and intelligence to be 0.23. We show that the rg is driven by physical overlap of GWAS hits in 5 genomic loci. We identified 67 shared genes between BV and intelligence, which are mainly involved in important signaling pathways regulating cell growth. Out of these 67 we prioritized 32 that are most likely to have functional impact. These results provide new information on the genetics of BV and provide biological insight into BV’s shared genetic etiology with intelligence.
biorxiv genetics 100-200-users 2019The user’s guide to comparative genomics with EnteroBase. Three case studies micro-clades within Salmonella enterica serovar Agama, ancient and modern populations of Yersinia pestis, and core genomic diversity of all Escherichia, bioRxiv, 2019-04-19
AbstractEnteroBase is an integrated software environment which supports the identification of global population structures within several bacterial genera including pathogens. It currently contains more than 300,000 genomes that have been assembled from Illumina short reads from the genera Salmonella, Escherichia, Yersinia, Clostridiodes, Helicobacter, Vibrio, and Moraxella. With the recent introduction of hierarchical clustering of core genome MLST sequence types, EnteroBase now facilitates the identification of close relatives of bacteria within those genera inside of a few hours of uploading their short reads. It also supports private collaborations between groups of users, and the comparison of genomic data that were assembled from short reads with SNP calls that were extracted from metagenomic sequences. Here we provide an overview for its users on how EnteroBase works, what it can do, and its future prospects. This user’s guide is illustrated by three case studies ranging in scale from the miniscule (local transmission of Salmonella between neighboring social groups of badgers) through pandemic transmission of plague and microevolution of Yersinia pestis over the last 5,000 years to a novel, global overview of the population structure of all of Escherichia.
biorxiv microbiology 100-200-users 2019A revised model for promoter competition based on multi-way chromatin interactions, bioRxiv, 2019-04-18
AbstractSpecific communication between gene promoters and enhancers is critical for accurate regulation of gene expression. However, it remains unclear how specific interactions between multiple regulatory elements and genes contained within a single chromatin domain are coordinated. Recent technological advances allow for the investigation of multi-way chromatin interactions at single alleles in individual nuclei. This can provide insights into how multiple regulatory elements cooperate or compete for transcriptional activation. We have used these techniques in a mouse model in which the α-globin domain is extended to include several additional genes. This allows us to determine how the interactions of the α-globin super-enhancer are distributed between multiple promoters in a single domain. Our data show that gene promoters do not form mutually exclusive interactions with the super-enhancer, but all interact simultaneously in a single complex. These finding show that promoters within the same domain do not structurally compete for interactions with enhancers, but form a regulatory hub structure, consistent with the recent model of transcriptional activation in phase-separated nuclear condensates.
biorxiv genomics 100-200-users 2019deSALT fast and accurate long transcriptomic read alignment with de Bruijn graph-based index, bioRxiv, 2019-04-18
AbstractLong-read RNA sequencing (RNA-seq) is a promising approach in transcriptomics studies, however, the alignment of the long reads is a fundamental but still non-trivial task due to sequencing errors and complicated gene structures. We propose de Bruijn graph-based Spliced Aligner for Long Transcriptome read (deSALT), a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons and uses them to generate high-quality spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and serious sequencing errors, which breakthroughs the bottlenecks of long RNA-seq read alignment. Benchmarks demonstrate that this approach has a greater ability to produce accurate and homogeneous full-length alignments and thus has enormous potentials in transcriptomics studies.
biorxiv bioinformatics 100-200-users 2019Benchmarking of alignment-free sequence comparison methods, bioRxiv, 2019-04-16
ABSTRACTAlignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Here, we present a community resource (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpafproject.org>httpafproject.org<jatsext-link>) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference and reconstruction of species trees under horizontal gene transfer and recombination events. The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.
biorxiv bioinformatics 100-200-users 2019Extensive impact of low-frequency variants on the phenotypic landscape at population-scale, bioRxiv, 2019-04-16
AbstractGenome-wide association studies (GWAS) allows to dissect the genetic basis of complex traits at the population level1. However, despite the extensive number of trait-associated loci found, they often fail to explain a large part of the observed phenotypic variance2–4. One potential source of this discrepancy could be the preponderance of undetected low-frequency genetic variants in natural populations5,6. To increase the allele frequency of those variants and assess their phenotypic effects at the population level, we generated a diallel panel consisting of 3,025 hybrids, derived from pairwise crosses between a subset of natural isolates from a completely sequenced 1,011 Saccharomyces cerevisiae population. We examined each hybrid across a large number of growth traits, resulting in a total of 148,225 crosstrait combinations. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a significant proportion (29%) is governed by non-additive effects. This is confirmed by the fact that a majority of complete dominance is observed in 25% of the traits. By performing GWAS on the diallel panel, we detected 1,723 significantly associated genetic variants, with 16.3% of them being low-frequency variants in the initial population. These variants, which would not be detected using classical GWAS, explain 21% of the phenotypic variance on average. Altogether, our results demonstrate that low-frequency variants should be accounted for as they contribute to a large part of the phenotypic variation observed in a population.
biorxiv genomics 100-200-users 2019