K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data, bioRxiv, 2017-12-06

High-throughput single-cell RNA-Seq (scRNA-Seq) is a powerful approach for studying heterogeneous tissues and dynamic cellular processes. However, compared to bulk RNA-Seq, single-cell expression profiles are extremely noisy, as they only capture a fraction of the transcripts present in the cell. Here, we propose the k-nearest neighbor smoothing (kNN-smoothing) algorithm, designed to reduce noise by aggregating information from similar cells (neighbors) in a computationally efficient and statistically tractable manner. The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on partially smoothed and variance-stabilized expression profiles, and then aggregating their transcript counts. We show that kNN-smoothing greatly improves the detection of clusters of cells and co-expressed genes, and clearly outperforms other smoothing methods on simulated data. To accurately perform smoothing for datasets containing highly similar cell populations, we propose the kNN-smoothing 2 algorithm, in which neighbors are determined after projecting the partially smoothed data onto the first few principal components. We show that unlike its predecessor, kNN-smoothing 2 can accurately distinguish between cells from different T cell subsets, and enables their identification in peripheral blood using unsupervised methods. Our work facilitates the analysis of scRNA-Seq data across a broad range of applications, including the identification of cell populations in heterogeneous tissues and the characterization of dynamic processes such as cellular differentiation. Reference implementations of our algorithms can be found at httpsgithub.comyanailabknn-smoothing.

biorxiv bioinformatics 0-100-users 2017

Rethinking phylogenetic comparative methods, bioRxiv, 2017-12-06

As a result of the process of descent with modification, closely related species tend to be similar to one another in a myriad different ways. In statistical terms, this means that traits measured on one species will not be independent of traits measured on others. Since their introduction in the 1980s, phylogenetic comparative methods (PCMs) have been framed as a solution to this problem. In this paper, we argue that this way of thinking about PCMs is deeply misleading. Not only has this sowed widespread confusion in the literature about what PCMs are doing but has led us to develop methods that are susceptible to the very thing we sought to build defenses against --- unreplicated evolutionary events. Through three Case Studies, we demonstrate that the susceptibility to singular events is indeed a recurring problem in comparative biology that links several seemingly unrelated controversies. In each Case Study we propose a potential solution to the problem. While the details of our proposed solutions differ, they share a common theme unifying hypothesis testing with data-driven approaches (which we term phylogenetic natural history) to disentangle the impact of singular evolutionary events from that of the factors we are investigating. More broadly, we argue that our field has, at times, been sloppy when weighing evidence in support of causal hypotheses. We suggest that one way to refine our inferences is to re-imagine phylogenies as probabilistic graphical models; adopting this way of thinking will help clarify precisely what we are testing and what evidence supports our claims.

biorxiv evolutionary-biology 100-200-users 2017

Assessing the Landscape of U.S. Postdoctoral Salaries, bioRxiv, 2017-12-04

AbstractPurposePostdocs make up a significant portion of the biomedical workforce. However, data about the postdoctoral position are generally scarce, including salary data. The purpose of this study was to request, obtain and interpret actual salaries, and the associated job titles, for postdocs at U.S. public institutions.MethodologyFreedom of Information Act Requests were submitted to U.S. public institutions estimated to have at least 300 postdocs according to the National Science Foundation’s Survey of Graduate Students and Postdocs. Salaries and job titles of postdoctoral employees as of December 1st, 2016 were requested.FindingsSalaries and job titles for over 13,000 postdocs at 52 public U.S. institutions and 1 private institution around the date of December 1st, 2016 were received, and individual postdoc names were also received for approximately 7,000 postdocs. This study shows evidence of gender-related salary discrepancies, a significant influence of job title description on postdoc salary, and a complex relationship between salaries and the level of institutional NIH funding.ValueThese results provide insights into the ability of institutions to collate actual payroll-type data related to their postdocs, highlighting difficulties faced in tracking, and reporting data on this population. Ultimately, these types of efforts, aimed at increasing transparency, may lead to improved tracking and support for postdocs at all U.S. institutions.

biorxiv scientific-communication-and-education 100-200-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo