A comprehensive atlas of immunological differences between humans, mice and non-human primates, bioRxiv, 2019-03-12
Animal models are an integral part of the drug development and evaluation process. However, they are unsurprisingly imperfect reflections of humans, and the extent and nature of many immunological differences are unknown. With the rise of targeted and biological therapeutics, it is increasingly important that we understand the molecular differences in immunological behavior of humans and model organisms. Thus, we profiled a large number of healthy humans, along with three of the model organisms most similar to humans rhesus and cynomolgus macaques and African green monkeys; and the most widely used mammalian model mice. Using cross-species, universal phenotyping and signaling panels, we measured immune cell signaling responses to an array of 15 stimuli using CyTOF mass cytometry. We found numerous instances of different cellular phenotypes and immune signaling events occurring within and between species with likely effects on evaluation of therapeutics, and detail three examples (double-positive T cell frequency and signaling; granulocyte response to Bacillus anthracis antigen; and B cell subsets). We also explore the correlation of herpes simian B virus serostatus on the immune profile. The full dataset is available online at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsflowrepository.org>httpsflowrepository.org<jatsext-link> (accession FR-FCM-Z2ZY) and <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsimmuneatlas.org>httpsimmuneatlas.org<jatsext-link>.
biorxiv immunology 100-200-users 2019A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes, bioRxiv, 2019-03-12
AbstractA platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies (ONT); in order to assess overall performance in transcript-level investigations, the technology was applied for sequencing sets of synthetic transcripts as well as a yeast transcriptome. However, despite initial efforts it remains crucial to further investigate characteristics of ONT native RNA sequencing when applied to much more complex transcriptomes. Here we thus undertook extensive native RNA sequencing of polyA+ RNA from two human cell lines, and thereby analysed ~5.2 million aligned native RNA reads which consisted of a total of ~4.6 billion bases. To enable informative comparisons, we also performed relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects hamper its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.
biorxiv bioinformatics 100-200-users 2019Best practices for making reliable inferences from citizen science data case study using eBird to estimate species distributions, bioRxiv, 2019-03-12
AbstractCitizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include species bias, spatial bias, variation in effort, and variation in observer skill.To demonstrate key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate three widely applied metrics for describing species distributions encounter rate, occupancy probability, and relative abundance. For each method, we outline approaches for data processing and modelling that are suitable for using citizen science data for estimating species distributions.Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with two key processes 1) the use of complete checklists rather than presence-only data, and 2) the use of covariates describing variation in effort and detectability for each checklist. Including these covariates accounted for heterogeneity in detectability and reporting, and resulted in substantial differences in predicted distributions. The data processing and analytical steps we outlined led to improved model performance across a range of sample sizes.When using citizen science data it is imperative to carefully consider the appropriate data processing and analytical procedures required to address the bias and variation. Here, we describe the consequences and utility of applying our suggested approach to semi-structured citizen science data to estimate species distributions. The methods we have outlined are also likely to improve other forms of inference and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.
biorxiv ecology 100-200-users 2019Deep learning of representations for transcriptomics-based phenotype prediction, bioRxiv, 2019-03-12
AbstractThe ability to predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. This task is complicated because expression data are high dimensional whereas each experiment is usually small (e.g.,∼20,000 genes may be measured for∼100 subjects). However, thousands of transcriptomics experiments with hundreds of thousands of samples are available in public repositories. Can representation learning techniques leverage these public data to improve predictive performance on other tasks? Here, we report a comprehensive analysis using different gene sets, normalization schemes, and machine learning methods on a set of 24 binary and multiclass prediction problems and 26 survival analysis tasks. Methods that combine large numbers of genes outperformed single gene methods, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that usingl2-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses.
biorxiv bioinformatics 0-100-users 2019Dopamine transients delivered in learning contexts do not act as model-free prediction errors, bioRxiv, 2019-03-12
AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.
biorxiv neuroscience 100-200-users 2019Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model, bioRxiv, 2019-03-12
AbstractSingle cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization pro-cedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We pro-pose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.
biorxiv genomics 200-500-users 2019