eQTL Catalogue a compendium of uniformly processed human gene expression and splicing QTLs, bioRxiv, 2020-01-30
AbstractAn increasing number of gene expression quantitative trait locus (QTL) studies have made summary statistics publicly available, which can be used to gain insight into human complex traits by downstream analyses such as fine-mapping and colocalisation. However, differences between these datasets in their variants tested, allele codings, and in the transcriptional features quantified are a barrier to their widespread use. Here, we present the eQTL Catalogue, a resource which contains quality controlled, uniformly re-computed QTLs from 19 eQTL publications. In addition to gene expression QTLs, we have also identified QTLs at the level of exon expression, transcript usage, and promoter, splice junction and 3ʹ end usage. Our summary statistics can be downloaded by FTP or accessed via a REST API and are also accessible via the Open Targets Genetics Portal. We demonstrate how the eQTL Catalogue and GWAS Catalog APIs can be used to perform colocalisation analysis between GWAS and QTL results without downloading and reformatting summary statistics. New datasets will continuously be added to the eQTL Catalogue, enabling systematic interpretation of human GWAS associations across a large number of cell types and tissues. The eQTL Catalogue is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.ebi.ac.ukeqtl>httpswww.ebi.ac.ukeqtl<jatsext-link>.
biorxiv genomics 0-100-users 2020HASLR Fast Hybrid Assembly of Long Reads, bioRxiv, 2020-01-28
AbstractThird generation sequencing technologies from platforms such as Oxford Nanopore Technologies and Pacific Biosciences have paved the way for building more contiguous assemblies and complete reconstruction of genomes. The larger effective length of the reads generated with these technologies has provided a mean to overcome the challenges of short to mid-range repeats. Currently, accurate long read assemblers are computationally expensive while faster methods are not as accurate. Therefore, there is still an unmet need for tools that are both fast and accurate for reconstructing small and large genomes. Despite the recent advances in third generation sequencing, researchers tend to generate second generation reads for many of the analysis tasks. Here, we present HASLR, a hybrid assembler which uses both second and third generation sequencing reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples.AvailabilityHASLR is an open source tool available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comvpc-ccghaslr>httpsgithub.comvpc-ccghaslr<jatsext-link>.
biorxiv bioinformatics 0-100-users 2020Mapping heritability of obesity by brain cell types, bioRxiv, 2020-01-28
The underlying cell types mediating predisposition to obesity remain largely obscure. Here we first integrated recently published single-cell RNA-sequencing (scRNA-seq) data from >380 peripheral and nervous system cell types spanning 19 mouse organs with body mass index (BMI) genome-wide association study (GWAS) data from >450,000 individuals. Leveraging a novel strategy for integrating scRNA-seq data with GWAS data, we identified 22, exclusively neuronal, cell types from the subthalamus, midbrain, hippocampus, thalamus, cortex, pons, medulla, pallidum that were significantly enriched for BMI heritability (P<1.6×10-4). Using genes harboring coding mutations leading to syndromic forms of obesity, we replicate four midbrain cell types from the anterior pretectal nucleus, superior nucleus, periaqueductal gray and pallidum (P<1.7×10-4). Testing an additional set of 347 hypothalamic cell types, ventromedial hypothalamic steroidogenic-factor 1 (SF1) and cholecystokinin b receptor (CCKBR)-expressing neurons (P=4.9×10-5) previously implicated in energy homeostasis and glucose control and three cell types from the preoptic area of the hypothalamus and the lateral hypothalamus enriched for BMI GWAS associations (P<4.9×10-5). Together, our results suggest brain nuclei regulating integration of sensory stimuli, learning and memory are likely to play a key role in obesity and provide testable hypotheses for mechanistic follow-up studies.
biorxiv genetics 0-100-users 2020A comprehensive atlas of white matter tracts in the chimpanzee, bioRxiv, 2020-01-26
AbstractChimpanzees (Pan troglodytes) are, along with bonobos, humans’ closest living relatives. The advent of diffusion tractography in recent years has allowed a resurgence of comparative neuroanatomical studies in humans and other primate species. Here, we offer, in comparative perspective, the first chimpanzee white matter atlas, coupled with surface projection maps of these major white matter tracts, constructed from in vivo chimpanzee diffusion-weighted scans. Comparative white matter atlases provide a useful tool for identifying neuroanatomical differences and similarities between humans and other primate species. Until now, comprehensive fascicular atlases have been created for humans (Homo sapiens), rhesus macaques (Macaca mulatta), and several other nonhuman primate species, but never in a nonhuman ape. Information on chimpanzee neuroanatomy is essential for understanding the anatomical specializations of white matter organization that are unique to the human lineage.
biorxiv neuroscience 0-100-users 2020Comparison of visualisation tools for single-cell RNAseq data, bioRxiv, 2020-01-26
In the last decade, single cell RNAseq (scRNAseq) datasets have grown from a single cell to millions of cells. Due to its high dimensionality, the scRNAseq data contains a lot of valuable information, however, it is not always feasible to visualise and share it in a scientific report or an article publication format. Recently, a lot of interactive analysis and visualisation tools have been developed to address this issue and facilitate knowledge transfer in the scientific community. In this study, we review and compare several of the currently available analysis and visualisation tools and benchmark those that allow to visualize the scRNAseq data on the web and share it with others. To address the problem of format compatibility for most visualisation tools, we have also developed a user-friendly R package, sceasy, which allows users to convert their own scRNAseq datasets into a specific data format for visualisation.
biorxiv bioinformatics 0-100-users 2020Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina Native American, African and European genetic ancestries in Argentina, bioRxiv, 2020-01-25
AbstractWe are at the dawn of the efforts to describe and understand the origins of genetic diversity in Argentina from high-throughput data. This knowledge is a primary step in the intent of deciphering the specific genetic bases of diseases and drug response in the country. Similarly to other populations across the Americas, genetic ancestry in Argentinean populations traces back into African, European and Native American ancestors, reflecting a complex demographic history with multiple migration and admixture events in pre- and post-colonial times. However, little is known about the sub-continental origins of these three main ancestries. We present new high-throughput genotyping data for 87 admixed individuals across Argentina. This data was combined to previously published data for admixed individuals in the region and then compared to different reference panels specifically built to run population structure analyses at a sub-continental level. Concerning the European and African ancestries, we confirmed previous results about their main origins, and we provide new insights into the presence of other origins that reflect historical records. As for the Native American ancestry, leveraging genotype data for archaeological samples in the region in order to gain temporal depth in our analyses, we could identify four Native American components segregating in modern Argentinean populations. Three of them are also found in modern South American populations and are specifically represented in Central ChilePatagonia, Lowlands and Central Andes geographic areas. The fourth one may be specific to the Central Western region of Argentina.Identifying such component has not been straightforward since it is not well represented in any genomic data from the literature. Altogether, we provide useful insights into the multiple population groups from different continents that have contributed to present-days genetic diversity in Argentina. We encourage the generation of massive genotype data locally to further describe the genetic structure in Argentina.Author SummaryThe human genetic diversity in Argentina reflects demographic mechanisms during which the European colonists invaded a territory where Native American populations were settled. During colonial period, the slave trade also prompted many African people to move to Argentina. Little is known about the origins of the Native American and African components in Argentinean populations nowadays.Genotyping data for 87 admixed individuals throughout Argentina was generated and data from the literature was re-analyzed to shed light on this question. We confirmed that most of the European genetic ancestry comes from the South, although several individuals are related to Northern Europeans. We found that African origins in Argentina trace back from different regions. As for the Native American ancestry, we identified that it can be divided into four main components that correspond to Central ChilePatagonia, Lowlands, Central Andes and Central Western region of Argentina. In order to understand the specificity of the genetic diversity in Argentina, we should not rely on knowledge generated in other populations. Instead, more effort is required to generate specific massive genomic knowledge at the local level.
biorxiv genetics 0-100-users 2020