Exponential fluorescent amplification of individual RNAs using clampFISH probes, bioRxiv, 2017-12-06
AbstractNon-enzymatic, high-gain signal amplification methods with single-cell, single-molecule resolution are in great need. We present click-amplifying FISH (clampFISH) for the fluorescent detection of RNA that combines the specificity of oligonucleotides with bioorthogonal click chemistry in order to achieve high specificity and extremely high-gain (>400x) signal amplification. We show that clampFISH signal enables detection with low magnification microscopy and separation of cells by RNA levels via flow cytometry. Additionally, we show that the modular design of clampFISH probes enables multiplexing, that the locking mechanism prevents probe detachment in expansion microscopy, and that clampFISH works in tissue samples.
biorxiv bioengineering 200-500-users 2017k-mer grammar uncovers maize regulatory architecture, bioRxiv, 2017-12-06
ABSTRACTOnly a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified. We developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features - k-mers. To do this, we borrowed two approaches from the field of natural language processing (1) “bag-of-words” which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built “bag-of-k-mers” and “vector-k-mers” models that distinguish between regulatory and non-regulatory regions with an accuracy above 90%. Our “bag-of-k-mers” achieved higher overall accuracy, while the “vector-k-mers” models were more useful in highlighting key groups of sequences within the regulatory regions. These models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.
biorxiv plant-biology 0-100-users 2017K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data, bioRxiv, 2017-12-06
High-throughput single-cell RNA-Seq (scRNA-Seq) is a powerful approach for studying heterogeneous tissues and dynamic cellular processes. However, compared to bulk RNA-Seq, single-cell expression profiles are extremely noisy, as they only capture a fraction of the transcripts present in the cell. Here, we propose the k-nearest neighbor smoothing (kNN-smoothing) algorithm, designed to reduce noise by aggregating information from similar cells (neighbors) in a computationally efficient and statistically tractable manner. The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on partially smoothed and variance-stabilized expression profiles, and then aggregating their transcript counts. We show that kNN-smoothing greatly improves the detection of clusters of cells and co-expressed genes, and clearly outperforms other smoothing methods on simulated data. To accurately perform smoothing for datasets containing highly similar cell populations, we propose the kNN-smoothing 2 algorithm, in which neighbors are determined after projecting the partially smoothed data onto the first few principal components. We show that unlike its predecessor, kNN-smoothing 2 can accurately distinguish between cells from different T cell subsets, and enables their identification in peripheral blood using unsupervised methods. Our work facilitates the analysis of scRNA-Seq data across a broad range of applications, including the identification of cell populations in heterogeneous tissues and the characterization of dynamic processes such as cellular differentiation. Reference implementations of our algorithms can be found at httpsgithub.comyanailabknn-smoothing.
biorxiv bioinformatics 0-100-users 2017Rethinking phylogenetic comparative methods, bioRxiv, 2017-12-06
As a result of the process of descent with modification, closely related species tend to be similar to one another in a myriad different ways. In statistical terms, this means that traits measured on one species will not be independent of traits measured on others. Since their introduction in the 1980s, phylogenetic comparative methods (PCMs) have been framed as a solution to this problem. In this paper, we argue that this way of thinking about PCMs is deeply misleading. Not only has this sowed widespread confusion in the literature about what PCMs are doing but has led us to develop methods that are susceptible to the very thing we sought to build defenses against --- unreplicated evolutionary events. Through three Case Studies, we demonstrate that the susceptibility to singular events is indeed a recurring problem in comparative biology that links several seemingly unrelated controversies. In each Case Study we propose a potential solution to the problem. While the details of our proposed solutions differ, they share a common theme unifying hypothesis testing with data-driven approaches (which we term phylogenetic natural history) to disentangle the impact of singular evolutionary events from that of the factors we are investigating. More broadly, we argue that our field has, at times, been sloppy when weighing evidence in support of causal hypotheses. We suggest that one way to refine our inferences is to re-imagine phylogenies as probabilistic graphical models; adopting this way of thinking will help clarify precisely what we are testing and what evidence supports our claims.
biorxiv evolutionary-biology 100-200-users 2017Testing the parasite mass burden effect on host behaviour alteration in the Schistocephalus-stickleback system, bioRxiv, 2017-12-06
ABSTRACTMany parasites with complex life cycles modify their intermediate host’s behaviour, which has been proposed to increase transmission to their definitive host. This behavioural change could result from the parasite actively manipulating its host, but could also be explained by a mechanical effect, where the parasite’s physical presence affects host behaviour. We created an artificial internal parasite using silicone injections in the body cavity to test this mechanical effect hypothesis. We used the Schistocephalus solidus - threespine stickleback (Gasterosteus aculeatus) system, as this cestode can reach up to 92% of its fish host mass. Our results suggest that the mass burden brought by this macroparasite alone is not sufficient to cause behavioural changes in its host. Furthermore, our results show that wall-hugging (thigmotaxis), a measure of anxiety in vertebrates, is significantly reduced in Schistocephalus-infected sticklebacks, unveiling a new altered component of behaviour that may result from manipulation by this macroparasite.
biorxiv animal-behavior-and-cognition 0-100-users 2017A quantitative model for characterizing the evolutionary history of mammalian gene expression, bioRxiv, 2017-12-05
AbstractCharacterizing the evolutionary history of a gene’s expression profile is a critical component for understanding the relationship between genotype, expression, and phenotype. However, it is not well-established how best to distinguish the different evolutionary forces acting on gene expression. Here, we use RNA-seq across 7 tissues from 17 mammalian species to show that expression evolution across mammals is accurately modeled by the Ornstein-Uhlenbeck (OU) process. This stochastic process models expression trajectories across time as Gaussian distributions whose variance is parameterized by the rate of genetic drift and strength of stabilizing selection. We use these mathematical properties to identify expression pathways under neutral, stabilizing, and directional selection, and quantify the extent of selective pressure on a gene’s expression. We further detect deleterious expression levels outside expected evolutionary distributions in expression data from individual patients. Our work provides a statistical framework for interpreting expression data across species and in disease.One Sentence SummaryWe demonstrate the power of a stochastic model for quantifying selective pressure on expression and estimating evolutionary distributions of optimal gene expression.
biorxiv genomics 0-100-users 2017