Bayesian analysis of GWAS summary data reveals differential signatures of natural selection across human complex traits and functional genomic categories, bioRxiv, 2019-09-01
AbstractUnderstanding how natural selection has shaped the genetic architecture of complex traits and diseases is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level data to estimate multiple features of genetic architecture, including signatures of natural selection. Here, we present an enhanced method (SBayesS) that only requires GWAS summary statistics and incorporates functional genomic annotations. We analysed GWAS data with large sample sizes for 155 complex traits and detected pervasive signatures of negative selection with diverse estimates of SNP-based heritability and polygenicity. Projecting these estimates onto a map of genetic architecture obtained from evolutionary simulations revealed relatively strong natural selection on genetic variants associated with cardiorespiratory and cognitive traits and relatively small number of mutational targets for diseases. Averaging across traits, the joint distribution of SNP effect size and MAF varied across functional genomic regions (likely to be a consequence of natural selection), with enrichment in both the number of associated variants and the magnitude of effect sizes in regions such as transcriptional start sites, coding regions and 5’- and 3’-UTRs.
biorxiv genetics 100-200-users 2019Comparison of bibliographic data sources Implications for the robustness of university rankings, bioRxiv, 2019-09-01
AbstractUniversities are increasingly evaluated, both internally and externally on the basis of their outputs. Often these are converted to simple, and frequently contested, rankings based on quantitative analysis of those outputs. These rankings can have substantial implications for student and staff recruitment, research income and perceived prestige of a university. Both internal and external analyses usually rely on a single data source to define the set of outputs assigned to a specific university. Although some differences between such databases are documented, few studies have explored them at the institutional scale and examined the implications of these differences for the metrics and rankings that are derived from them. We address this gap by performing detailed bibliographic comparisons between three key databases Web of Science (WoS), Scopus and, the recently relaunched Microsoft Academic (MSA). We analyse the differences between outputs with DOIs identified from each source for a sample of 155 universities and supplement this with a detailed manual analysis of the differences for fifteen universities. We find significant differences between the sources at the university level. Sources differ in the publication year of specific objects, the completeness of metadata, as well as in their coverage of disciplines, outlets, and publication type. We construct two simple rankings based on citation counts and open access status of the outputs for these universities and show dramatic changes in position based on the choice of bibliographic data sources. Those universities that experience the largest changes are frequently those from non-English speaking countries and those that are outside the top positions in international university rankings. Overall MSA has greater coverage than Scopus or WoS, but has less complete affiliation metadata. We suggest that robust evaluation measures need to consider the effect of choice of data sources and recommend an approach where data from multiple sources is integrated to provide a more robust dataset.
biorxiv scientific-communication-and-education 0-100-users 2019Interspecies transcriptome analyses identify genes that control the development and evolution of limb skeletal proportion, bioRxiv, 2019-09-01
AbstractDespite the great diversity of vertebrate limb proportion and our deep understanding of the genetic mechanisms that drive skeletal elongation, little is known about how individual bones reach different lengths in any species. Here, we directly compare the transcriptomes of homologous growth cartilages of the mouse (Mus musculus) and bipedal jerboa (Jaculus jaculus), which has extremely long metatarsals of the feet and ‘mouse-like’ arms. When we intersected gene expression differences in metatarsals of the two species with expression differences in forearms, we found that about 10% of all orthologous genes are associated with disproportionate elongation of jerboa feet. Among these, Shox2, has gained expression in jerboa metatarsals where it is not expressed in other vertebrates that have been assessed. This transcription factor is necessary for proximal limb elongation, and we show that it is sufficient to increase mouse distal limb length. Unexpectedly, we also found evidence that jerboa foot elongation occurs in part by releasing latent growth potential that is repressed in mouse feet. In jerboa metatarsals, we observed higher expression of Crabp1, an antagonist of growth inhibitory retinoic acid, lower expression of Gdf10, an inhibitory TGFβ ligand, and lower expression of Mab21L2, a BMP signaling inhibitor that we show is sufficient to reduce limb bone elongation. By intersecting our data with prior expression analyses in other systems, we identify mechanisms that may both establish limb proportion during development and diversify proportion during evolution. The genes we identified here therefore provide a framework to understand the modular genetic control of skeletal growth and the remarkable malleability of vertebrate limb proportion.
biorxiv developmental-biology 0-100-users 2019Robustness and applicability of functional genomics tools on scRNA-seq data, bioRxiv, 2019-09-01
AbstractMany tools have been developed to extract functional and mechanistic insight from bulk transcriptome profiling data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events, low library sizes and a comparatively large number of samplescells. It is thus not clear if functional genomics tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. To address this question, we performed benchmark studies on in silico and in vitro single-cell RNA-seq data. We included the bulk-RNA tools PROGENy, GO enrichment and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compared them against the tools AUCell and metaVIPER, designed for scRNA-seq. For the in silico study we simulated single cells from TFpathway perturbation bulk RNA-seq experiments. Our simulation strategy guarantees that the information of the original perturbation is preserved while resembling the characteristics of scRNA-seq data. We complemented the in silico data with in vitro scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on both the simulated and real data revealed comparable performance to the original bulk data. Additionally, we showed that the TF and pathway activities preserve cell-type specific variability by analysing a mixture sample sequenced with 13 scRNA-seq different protocols. Our analyses suggest that bulk functional genomics tools can be applied to scRNA-seq data, outperforming dedicated single cell tools. Furthermore we provide a benchmark for further methods development by the community.
biorxiv bioinformatics 100-200-users 2019Undulating changes in human plasma proteome across lifespan are linked to disease, bioRxiv, 2019-09-01
Aging is the predominant risk factor for numerous chronic diseases that limit healthspan. Mechanisms of aging are thus increasingly recognized as therapeutic targets. Blood from young mice reverses aspects of aging and disease across multiple tissues, pointing to the intriguing possibility that age-related molecular changes in blood can provide novel insight into disease biology. We measured 2,925 plasma proteins from 4,331 young adults to nonagenarians and developed a novel bioinformatics approach which uncovered profound non-linear alterations in the human plasma proteome with age. Waves of changes in the proteome in the fourth, seventh, and eighth decades of life reflected distinct biological pathways, and revealed differential associations with the genome and proteome of age-related diseases and phenotypic traits. This new approach to the study of aging led to the identification of unexpected signatures and pathways of aging and disease and offers potential pathways for aging interventions.
biorxiv systems-biology 100-200-users 2019Assessing the potential of environmental DNA metabarcoding for monitoring Neotropical mammals a case study in the Amazon and Atlantic Forest, Brazil, bioRxiv, 2019-08-31
AbstractThe application of environmental DNA (eDNA) metabarcoding as a biomonitoring tool has greatly increased in the last decade. However, most studies have focused on aquatic macro-organisms in temperate areas (e.g., fishes). We apply eDNA metabarcoding to detect the mammalian community in two high-biodiversity regions of Brazil, the Amazon and Atlantic Forest. We identified critically endangered and endangered mammalian species in the Atlantic Forest and Amazon respectively and found congruence with species identified via camera trapping in the Atlantic Forest. In light of our results, we highlight the potential and challenges of eDNA monitoring for mammals in these high biodiverse areas.
biorxiv ecology 0-100-users 2019