Expanding the CITE-seq tool-kit Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay, bioRxiv, 2018-11-09
ABSTRACTRapid technological progress in the recent years has allowed the high-throughput interrogation of different types of biomolecules from single cells. Combining several of these readouts into integrated multi-omic assays is essential to comprehensively understand and model cellular processes. Here, we report the development of Expanded CRISPR-compatible Cellular Indexing of Transcriptomes and Epitopes by sequencing (ECCITE-seq) for the high-throughput characterization of at least five modalities of information from each single cell transcriptome, immune receptor clonotypes, surface markers, sample identity and sgRNAs. We demonstrate the use of ECCITE-seq to directly and efficiently capture sgRNA molecules and measure their effects on gene expression and protein levels, opening the possibility of performing high throughput single cell CRISPR screens with multimodal readout using existing libraries and commonly used vectors. Finally, by utilizing the combined phenotyping of clonotype and cell surface markers in immune cells, we apply ECCITE to study a lymphoma sample to discriminate cells and define molecular signatures of malignant cells within a heterogeneous population.
biorxiv genomics 100-200-users 2018AnnoTree visualization and exploration of a functionally annotated microbial tree of life, bioRxiv, 2018-11-06
AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 geneprotein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infectiondefense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.
biorxiv bioinformatics 100-200-users 2018Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity, bioRxiv, 2018-11-06
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comspacegraphcats>httpsgithub.comspacegraphcats<jatsext-link> spacegraphcats under the 3-Clause BSD License.
biorxiv bioinformatics 100-200-users 2018Fast, sensitive, and accurate integration of single cell data with Harmony, bioRxiv, 2018-11-05
AbstractThe rapidly emerging diversity of single cell RNAseq datasets allows us to characterize the transcriptional behavior of cell types across a wide variety of biological and clinical conditions. With this comprehensive breadth comes a major analytical challenge. The same cell type across tissues, from different donors, or in different disease states, may appear to express different genes. A joint analysis of multiple datasets requires the integration of cells across diverse conditions. This is particularly challenging when datasets are assayed with different technologies in which real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms. Moreover, we show that Harmony requires dramatically fewer computational resources. It is the only available algorithm that makes the integration of ∼ 106 cells feasible on a personal computer. We demonstrate that Harmony identifies both broad populations and fine-grained subpopulations of PBMCs from datasets with large experimental differences. In a meta-analysis of 14,746 cells from 5 studies of human pancreatic islet cells, Harmony accounts for variation among technologies and donors to successfully align several rare subpopulations. In the resulting integrated embedding, we identify a previously unidentified population of potentially dysfunctional alpha islet cells, enriched for genes active in the Endoplasmic Reticulum (ER) stress response. The abundance of these alpha cells correlates across donors with the proportion of dysfunctional beta cells also enriched in ER stress response genes. Harmony is a fast and flexible general purpose integration algorithm that enables the identification of shared fine-grained subpopulations across a variety of experimental and biological conditions.
biorxiv bioinformatics 100-200-users 2018Neural Population Control via Deep Image Synthesis, bioRxiv, 2018-11-05
Particular deep artificial neural networks (ANNs) are today’s most accurate models of the primate brain’s ventral visual stream. Here we report that, using a targeted ANN-driven image synthesis method, new luminous power patterns (i.e. images) can be applied to the primate retinae to predictably push the spiking activity of targeted V4 neural sites beyond naturally occurring levels. More importantly, this method, while not yet perfect, already achieves unprecedented independent control of the activity state of entire populations of V4 neural sites, even those with overlapping receptive fields. These results show how the knowledge embedded in today’s ANN models might be used to non-invasively set desired internal brain states at neuron-level resolution, and suggest that more accurate ANN models would produce even more accurate control.
biorxiv neuroscience 100-200-users 2018Valid post-clustering differential analysis for single-cell RNA-Seq, bioRxiv, 2018-11-05
SummarySingle-cell computational pipelines involve two critical steps organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comjessemzhangtn_test>httpsgithub.comjessemzhangtn_test<jatsext-link>.
biorxiv bioinformatics 100-200-users 2018