Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis, bioRxiv, 2018-10-19
While many disease-associated variants have been identified through genome-wide association studies, their downstream molecular consequences remain unclear. To identify these effects, we performed cis- and trans-expression quantitative trait locus (eQTL) analysis in blood from 31,684 individuals through the eQTLGen Consortium. We observed that cis-eQTLs can be detected for 88% of the studied genes, but that they have a different genetic architecture compared to disease-associated variants, limiting our ability to use cis-eQTLs to pinpoint causal genes within susceptibility loci. In contrast, trans-eQTLs (detected for 37% of 10,317 studied trait-associated variants) were more informative. Multiple unlinked variants, associated to the same complex trait, often converged on trans-genes that are known to play central roles in disease etiology. We observed the same when ascertaining the effect of polygenic scores calculated for 1,263 genome-wide association study (GWAS) traits. Expression levels of 13% of the studied genes correlated with polygenic scores, and many resulting genes are known to drive these traits.
biorxiv genomics 200-500-users 2018Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis, bioRxiv, 2018-10-19
SummaryWhile many disease-associated variants have been identified through genome-wide association studies, their downstream molecular consequences remain unclear.To identify these effects, we performed cis- and trans-expression quantitative trait locus (eQTL) analysis in blood from 31,684 individuals through the eQTLGen Consortium.We observed that cis-eQTLs can be detected for 88% of the studied genes, but that they have a different genetic architecture compared to disease-associated variants, limiting our ability to use cis-eQTLs to pinpoint causal genes within susceptibility loci.In contrast, trans-eQTLs (detected for 37% of 10,317 studied trait-associated variants) were more informative. Multiple unlinked variants, associated to the same complex trait, often converged on trans-genes that are known to play central roles in disease etiology.We observed the same when ascertaining the effect of polygenic scores calculated for 1,263 genome-wide association study (GWAS) traits. Expression levels of 13% of the studied genes correlated with polygenic scores, and many resulting genes are known to drive these traits.
biorxiv genomics 200-500-users 2018Using long-read sequencing to detect imprinted DNA methylation, bioRxiv, 2018-10-17
Systematic variation in the methylation of cytosines at CpG sites plays a critical role in early development of humans and other mammals. Of particular interest are regions of differential methylation between parental alleles, as these often dictate monoallelic gene expression, resulting in parent of origin specific control of the embryonic transcriptome and subsequent development, in a phenomenon known as genomic imprinting. Using long-read nanopore sequencing we show that, with an average genomic coverage of approximately ten, it is possible to determine both the level of methylation of CpG sites and the haplotype from which each read arises. The long-read property is exploited to characterise, using novel methods, both methylation and haplotype for reads that have reduced basecalling precision compared to Sanger sequencing. We validate the analysis both through comparison of nanopore-derived methylation patterns with those from Reduced Representation Bisulfite Sequencing data and through comparison with previously reported data. Our analysis successfully identifies known imprinting control regions as well as some novel differentially methylated regions which, due to their proximity to hitherto unknown monoallelically expressed genes, may represent new imprinting control regions.
biorxiv genomics 0-100-users 2018Comparison of single-cell whole-genome amplification strategies, bioRxiv, 2018-10-16
Single-cell genomics is an alluring area that holds the potential to change the way we understand cell populations. Due to the small amount of DNA within a single cell, whole-genome amplification becomes a mandatory step in many single-cell applications. Unfortunately, single-cell whole-genome amplification (scWGA) strategies suffer from several technical biases that complicate the posterior interpretation of the data. Here we compared the performance of six different scWGA methods (GenomiPhi, REPLIg, TruePrime, Ampli1, MALBAC, and PicoPLEX) after amplifying and low-pass sequencing the complete genome of 230 healthytumoral human cells. Overall, REPLIg outperformed competing methods regarding DNA yield, amplicon size, amplification breadth, amplification uniformity –being the only method with a random amplification bias–, and false single-nucleotide variant calls. On the other hand, non-MDA methods, and in particular Ampli1, showed less allelic imbalance and ADO, more reliable copy-number profiles and less chimeric amplicons. While no single scWGA method showed optimal performance for every aspect, they clearly have distinct advantages. Our results provide a convenient guide for selecting a scWGA method depending on the question of interest while revealing relevant weaknesses that should be considered during the analysis and interpretation of single-cell sequencing data.
biorxiv genomics 100-200-users 2018The whale shark genome reveals how genomic and physiological properties scale with body size, bioRxiv, 2018-10-14
AbstractThe endangered whale shark (Rhincodon typus) is the largest fish on Earth and is a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 81 animals and yeast. We examined scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic features also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture GC content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to large introns highly enriched in repetitive elements such as CR1-like LINEs, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark’s genome had an expansion of gene families related to fatty acid metabolism and neurogenesis, with the slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan, and showed that the whale shark is a promising model for studies of neural architecture and lifespan.
biorxiv genomics 0-100-users 2018Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease, bioRxiv, 2018-10-13
AbstractA central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features, and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. We show that this map is informative about both human evolution and disease.
biorxiv genomics 100-200-users 2018