BOSS-RUNS a flexible and practical dynamic read sampling framework for nanopore sequencing, bioRxiv, 2020-02-08
AbstractReal-time selective sequencing of individual DNA fragments, or ‘Read Until’, allows the focusing of Oxford Nanopore Technology sequencing on pre-selected genomic regions. This can lead to large improvements in DNA sequencing performance in many scenarios where only part of the DNA content of a sample is of interest. This approach is based on the idea of deciding whether to sequence a fragment completely after having sequenced only a small initial part of it. If, based on this small part, the fragment is not deemed of (sufficient) interest it is rejected and sequencing is continued on a new fragment. To date, only simple decision strategies based on location within a genome have been proposed to determine what fragments are of interest. We present a new mathematical model and algorithm for the real-time assessment of the value of prospective fragments. Our decision framework is based not only on which genomic regions are a priori interesting, but also on which fragments have so far been sequenced, and so on the current information available regarding the genome being sequenced. As such, our strategy can adapt dynamically during each run, focusing sequencing efforts in areas of highest uncertainty (typically areas currently low coverage). We show that our approach can lead to considerable savings of time and materials, providing high-confidence genome reconstruction sooner than a standard sequencing run, and resulting in more homogeneous coverage across the genome, even when entire genomes are of interest.Author SummaryAn existing technique called ‘Read Until’ allows selective sequencing of DNA fragments with an Oxford Nanopore Technology (ONT) sequencer. With Read Until it is possible to enrich coverage of areas of interest within a sequenced genome. We propose a new use of this technique combining a mathematical model of read utility and an algorithm to select an optimal dynamic decision strategy (i.e. one that can be updated in real time, and so react to the data generated so far in an experiment), we show that it possible to improve the efficiency of a sequencing run by focusing effort on areas of highest uncertainty.
biorxiv genomics 0-100-users 2020A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods, bioRxiv, 2020-01-30
ABSTRACTThe rapid development of single-cell RNA-sequencing (scRNA-seq) technology, with increased sparsity compared to bulk RNA-sequencing (RNA-seq), has led to the emergence of many methods for preprocessing, including imputation methods. Here, we systematically evaluate the performance of 18 state-of-the-art scRNA-seq imputation methods using cell line and tissue data measured across experimental protocols. Specifically, we assess the similarity of imputed cell profiles to bulk samples as well as investigate whether methods recover relevant biological signals or introduce spurious noise in three downstream analyses differential expression, unsupervised clustering, and inferring pseudotemporal trajectories. Broadly, we found significant variability in the performance of the methods across evaluation settings. While most scRNA-seq imputation methods recover biological expression observed in bulk RNA-seq data, the majority of the methods do not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. Furthermore, we find that the performance of scRNA-seq imputation methods depends on many factors including the experimental protocol, the sparsity of the data, the number of cells in the dataset, and the magnitude of the effect sizes. We summarize our results and provide a key set of recommendations for users and investigators to navigate the current space of scRNA-seq imputation methods.
biorxiv genomics 0-100-users 2020Copy number variants outperform SNPs to reveal genotype-temperature association in a marine species, bioRxiv, 2020-01-30
AbstractCopy number variants (CNVs) are a major component of genotypic and phenotypic variation in genomes. Yet, our knowledge on genotypic variation and evolution is often limited to single nucleotide polymorphism (SNPs) and the role of CNVs has been overlooked in non-model species, partly due to their challenging identification until recently. Here, we document the usefulness of reduced-representation sequencing data (RAD-seq) to detect and investigate copy number variants (CNVs) alongside SNPs in American lobster (Homarus americanus) populations. We conducted a comparative study to examine the potential role of SNPs and CNVs in local adaptation by sequencing 1141 lobsters from 21 sampling sites within the southern Gulf of St. Lawrence which experiences the highest yearly thermal variance of the Canadian marine coastal waters. Our results demonstrated that CNVs accounts for higher genetic differentiation than SNP markers. Contrary to SNPs for which no association was found, genetic-environment association revealed that 48 CNV candidates were significantly associated with the annual variance of sea surface temperature, leading to the genetic clustering of sampling locations despite their geographic separation. Altogether, we provide a strong empirical case that CNVs putatively contribute to local adaptation in marine species and unveil stronger spatial signal than SNPs. Our study provides the means to study CNVs in non-model species and underlines the importance to consider structural variants alongside SNPs to enhance our understanding of ecological and evolutionary processes shaping adaptive population structure.
biorxiv genomics 0-100-users 2020eQTL Catalogue a compendium of uniformly processed human gene expression and splicing QTLs, bioRxiv, 2020-01-30
AbstractAn increasing number of gene expression quantitative trait locus (QTL) studies have made summary statistics publicly available, which can be used to gain insight into human complex traits by downstream analyses such as fine-mapping and colocalisation. However, differences between these datasets in their variants tested, allele codings, and in the transcriptional features quantified are a barrier to their widespread use. Here, we present the eQTL Catalogue, a resource which contains quality controlled, uniformly re-computed QTLs from 19 eQTL publications. In addition to gene expression QTLs, we have also identified QTLs at the level of exon expression, transcript usage, and promoter, splice junction and 3ʹ end usage. Our summary statistics can be downloaded by FTP or accessed via a REST API and are also accessible via the Open Targets Genetics Portal. We demonstrate how the eQTL Catalogue and GWAS Catalog APIs can be used to perform colocalisation analysis between GWAS and QTL results without downloading and reformatting summary statistics. New datasets will continuously be added to the eQTL Catalogue, enabling systematic interpretation of human GWAS associations across a large number of cell types and tissues. The eQTL Catalogue is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.ebi.ac.ukeqtl>httpswww.ebi.ac.ukeqtl<jatsext-link>.
biorxiv genomics 0-100-users 2020Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for twelve immune-mediated diseases, bioRxiv, 2020-01-15
AbstractThe identification of causal genetic variants for common diseases improves understanding of disease biology. Here we use data from the BLUEPRINT project to identify regulatory quantitative trait loci (QTL) for three primary human immune cell types and use these to fine-map putative causal variants for twelve immune-mediated diseases. We identify 340 unique, non major histocompatibility complex (MHC) disease loci that colocalise with high (>98%) posterior probability with regulatory QTLs, and apply Bayesian frameworks to fine-map associations at each locus. We show that fine-mapping applied to regulatory QTLs yields smaller credible set sizes and higher posterior probabilities for candidate causal variants compared to disease summary statistics. We also describe a systematic under-representation of insertiondeletion (INDEL) polymorphisms in credible sets derived from publicly available disease meta-analysis when compared to QTLs based on genome-sequencing data. Overall, our findings suggest that fine-mapping applied to disease-colocalising regulatory QTLs can enhance the discovery of putative causal disease variants and provide insights into the underlying causal genes and molecular mechanisms.
biorxiv genomics 0-100-users 2020Sampling artifacts in single-cell genomics cohort studies, bioRxiv, 2020-01-15
AbstractRobust protocols and automation now enable large-scale single-cell RNA and ATAC sequencing experiments and their application on biobank and clinical cohorts. However, technical biases introduced during sample acquisition can hinder solid, reproducible results and a systematic benchmarking is required before entering large-scale data production. Here, we report the existence and extent of gene expression and chromatin accessibility artifacts introduced during sampling and identify experimental and computational solutions for their prevention.
biorxiv genomics 100-200-users 2020