biorxiv | audiences

Widespread divergent transcription from prokaryotic promoters, bioRxiv, 2020-02-03

ABSTRACTPromoters are DNA sequences that stimulate the initiation of transcription. In all prokaryotes, promoters are believed to drive transcription in a single direction. Here we show that prokaryotic promoters are frequently bidirectional and drive divergent transcription. Mechanistically, this occurs because key promoter elements have inherent symmetry and often coincide on opposite DNA strands. Reciprocal stimulation between divergent transcription start sites also contributes. Horizontally acquired DNA is enriched for bidirectional promoters suggesting that they represent an early step in prokaryotic promoter evolution.

biorxiv molecular-biology 0-100-users 2020

A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods, bioRxiv, 2020-01-30

ABSTRACTThe rapid development of single-cell RNA-sequencing (scRNA-seq) technology, with increased sparsity compared to bulk RNA-sequencing (RNA-seq), has led to the emergence of many methods for preprocessing, including imputation methods. Here, we systematically evaluate the performance of 18 state-of-the-art scRNA-seq imputation methods using cell line and tissue data measured across experimental protocols. Specifically, we assess the similarity of imputed cell profiles to bulk samples as well as investigate whether methods recover relevant biological signals or introduce spurious noise in three downstream analyses differential expression, unsupervised clustering, and inferring pseudotemporal trajectories. Broadly, we found significant variability in the performance of the methods across evaluation settings. While most scRNA-seq imputation methods recover biological expression observed in bulk RNA-seq data, the majority of the methods do not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. Furthermore, we find that the performance of scRNA-seq imputation methods depends on many factors including the experimental protocol, the sparsity of the data, the number of cells in the dataset, and the magnitude of the effect sizes. We summarize our results and provide a key set of recommendations for users and investigators to navigate the current space of scRNA-seq imputation methods.

biorxiv genomics 0-100-users 2020

Copy number variants outperform SNPs to reveal genotype-temperature association in a marine species, bioRxiv, 2020-01-30

AbstractCopy number variants (CNVs) are a major component of genotypic and phenotypic variation in genomes. Yet, our knowledge on genotypic variation and evolution is often limited to single nucleotide polymorphism (SNPs) and the role of CNVs has been overlooked in non-model species, partly due to their challenging identification until recently. Here, we document the usefulness of reduced-representation sequencing data (RAD-seq) to detect and investigate copy number variants (CNVs) alongside SNPs in American lobster (Homarus americanus) populations. We conducted a comparative study to examine the potential role of SNPs and CNVs in local adaptation by sequencing 1141 lobsters from 21 sampling sites within the southern Gulf of St. Lawrence which experiences the highest yearly thermal variance of the Canadian marine coastal waters. Our results demonstrated that CNVs accounts for higher genetic differentiation than SNP markers. Contrary to SNPs for which no association was found, genetic-environment association revealed that 48 CNV candidates were significantly associated with the annual variance of sea surface temperature, leading to the genetic clustering of sampling locations despite their geographic separation. Altogether, we provide a strong empirical case that CNVs putatively contribute to local adaptation in marine species and unveil stronger spatial signal than SNPs. Our study provides the means to study CNVs in non-model species and underlines the importance to consider structural variants alongside SNPs to enhance our understanding of ecological and evolutionary processes shaping adaptive population structure.

biorxiv genomics 0-100-users 2020

eQTL Catalogue a compendium of uniformly processed human gene expression and splicing QTLs, bioRxiv, 2020-01-30

AbstractAn increasing number of gene expression quantitative trait locus (QTL) studies have made summary statistics publicly available, which can be used to gain insight into human complex traits by downstream analyses such as fine-mapping and colocalisation. However, differences between these datasets in their variants tested, allele codings, and in the transcriptional features quantified are a barrier to their widespread use. Here, we present the eQTL Catalogue, a resource which contains quality controlled, uniformly re-computed QTLs from 19 eQTL publications. In addition to gene expression QTLs, we have also identified QTLs at the level of exon expression, transcript usage, and promoter, splice junction and 3ʹ end usage. Our summary statistics can be downloaded by FTP or accessed via a REST API and are also accessible via the Open Targets Genetics Portal. We demonstrate how the eQTL Catalogue and GWAS Catalog APIs can be used to perform colocalisation analysis between GWAS and QTL results without downloading and reformatting summary statistics. New datasets will continuously be added to the eQTL Catalogue, enabling systematic interpretation of human GWAS associations across a large number of cell types and tissues. The eQTL Catalogue is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.ebi.ac.ukeqtl>httpswww.ebi.ac.ukeqtl<jatsext-link>.

biorxiv genomics 0-100-users 2020

HASLR Fast Hybrid Assembly of Long Reads, bioRxiv, 2020-01-28

AbstractThird generation sequencing technologies from platforms such as Oxford Nanopore Technologies and Pacific Biosciences have paved the way for building more contiguous assemblies and complete reconstruction of genomes. The larger effective length of the reads generated with these technologies has provided a mean to overcome the challenges of short to mid-range repeats. Currently, accurate long read assemblers are computationally expensive while faster methods are not as accurate. Therefore, there is still an unmet need for tools that are both fast and accurate for reconstructing small and large genomes. Despite the recent advances in third generation sequencing, researchers tend to generate second generation reads for many of the analysis tasks. Here, we present HASLR, a hybrid assembler which uses both second and third generation sequencing reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples.AvailabilityHASLR is an open source tool available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comvpc-ccghaslr>httpsgithub.comvpc-ccghaslr<jatsext-link>.

biorxiv bioinformatics 0-100-users 2020

Mapping heritability of obesity by brain cell types, bioRxiv, 2020-01-28

The underlying cell types mediating predisposition to obesity remain largely obscure. Here we first integrated recently published single-cell RNA-sequencing (scRNA-seq) data from >380 peripheral and nervous system cell types spanning 19 mouse organs with body mass index (BMI) genome-wide association study (GWAS) data from >450,000 individuals. Leveraging a novel strategy for integrating scRNA-seq data with GWAS data, we identified 22, exclusively neuronal, cell types from the subthalamus, midbrain, hippocampus, thalamus, cortex, pons, medulla, pallidum that were significantly enriched for BMI heritability (P<1.6×10-4). Using genes harboring coding mutations leading to syndromic forms of obesity, we replicate four midbrain cell types from the anterior pretectal nucleus, superior nucleus, periaqueductal gray and pallidum (P<1.7×10-4). Testing an additional set of 347 hypothalamic cell types, ventromedial hypothalamic steroidogenic-factor 1 (SF1) and cholecystokinin b receptor (CCKBR)-expressing neurons (P=4.9×10-5) previously implicated in energy homeostasis and glucose control and three cell types from the preoptic area of the hypothalamus and the lateral hypothalamus enriched for BMI GWAS associations (P<4.9×10-5). Together, our results suggest brain nuclei regulating integration of sensory stimuli, learning and memory are likely to play a key role in obesity and provide testable hypotheses for mechanistic follow-up studies.

biorxiv genetics 0-100-users 2020