The art of using t-SNE for single-cell transcriptomics, bioRxiv, 2018-10-26
AbstractSingle-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.
biorxiv bioinformatics 100-200-users 2018Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and allow analysis of large datasets, bioRxiv, 2018-10-25
Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We developed opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Liebler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.
biorxiv bioinformatics 100-200-users 2018Microbiota profiling with long amplicons using Nanopore sequencing full-length 16S rRNA gene and whole rrn operon, bioRxiv, 2018-10-24
Background Profiling microbiome on low biomass samples is challenging for metagenomics since these samples are prone to present DNA from other sources, such as the host or the environment. The usual approach is sequencing specific hypervariable regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. Here, we aim to assess long-amplicon PCR-based approaches for assigning taxonomy at the genus and species level. We use Nanopore sequencing with two different markers full-length 16S rRNA (~1,500 bp) and the whole rrn operon (16S rRNA gene - ITS - 23S rRNA gene; 4,500 bp).Methods We sequenced a clinical isolate of Staphylococcus pseudintermedius, two mock communities (HM-783D, Bei Resources; D6306, ZymoBIOMICS) and two pools of low-biomass samples (dog skin). Nanopore sequencing was performed on MinION (Oxford Nanopore Technologies) using 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using WIMP workflow on EPI2ME (ONT) or Minimap2 software with rrn database.Results Full-length 16S rRNA and the rrn operon retrieved the microbiota composition from the bacterial isolate, the mock communities and the complex skin samples, even at the genus and species level. For Staphylococcus pseudintermedius isolate, when using EPI2ME, the amplicons were assigned to the correct bacterial species in ~98% of the cases with rrn operon as the marker, and ~68% of the cases with 16S rRNA gene respectively. In both skin microbiota samples, we detected many species with an environmental origin. In chin, we found different Pseudomonas species in high abundance, whereas in the dorsal skin there were more taxa with lower abundances.Conclusions Both full-length 16S rRNA and the rrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, rrn operon would be the best choice.
biorxiv microbiology 100-200-users 2018Origins and Evolution of the Global RNA Virome, bioRxiv, 2018-10-24
AbstractViruses with RNA genomes dominate the eukaryotic virome, reaching enormous diversity in animals and plants. The recent advances of metaviromics prompted us to perform a detailed phylogenomic reconstruction of the evolution of the dramatically expanded global RNA virome. The only universal gene among RNA viruses is the RNA-dependent RNA polymerase (RdRp). We developed an iterative computational procedure that alternates the RdRp phylogenetic tree construction with refinement of the underlying multiple sequence alignments. The resulting tree encompasses 4,617 RNA virus RdRps and consists of 5 major branches, 2 of which include positive-sense RNA viruses, 1 is a mix of positive-sense (+) RNA and double-stranded (ds) RNA viruses, and 2 consist of dsRNA and negative-sense (−) RNA viruses, respectively. This tree topology implies that dsRNA viruses evolved from +RNA viruses on at least two independent occasions, whereas -RNA viruses evolved from dsRNA viruses. Reconstruction of RNA virus evolution using the RdRp tree as the scaffold suggests that the last common ancestors of the major branches of +RNA viruses encoded only the RdRp and a single jelly-roll capsid protein. Subsequent evolution involved independent capture of additional genes, particularly, those encoding distinct RNA helicases, enabling replication of larger RNA genomes and facilitating virus genome expression and virus-host interactions. Phylogenomic analysis reveals extensive gene module exchange among diverse viruses and horizontal virus transfer between distantly related hosts. Although the network of evolutionary relationships within the RNA virome is bound to further expand, the present results call for a thorough reevaluation of the RNA virus taxonomy.IMPORTANCEThe majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances of metagenomics have led to the discovery of many new groups of RNA viruses in a wide range of hosts. These findings enable a far more complete reconstruction of the evolution of RNA viruses than what was attainable previously. This reconstruction reveals the relationships between different Baltimore Classes of viruses and indicates extensive transfer of viruses between distantly related hosts, such as plants and animals. These results call for a major revision of the existing taxonomy of RNA viruses.
biorxiv microbiology 100-200-users 2018Fix your membrane receptor imaging Actin cytoskeleton and CD4 membrane organization disruption by chemical fixation, bioRxiv, 2018-10-23
Single-molecule localization microscopy (SMLM) techniques allow near molecular scale resolution (~ 20nm) as well as precise and robust analysis of protein organization at different scales. SMLM hardware, analytics and probes have been the focus of a variety of studies and are now commonly used in laboratories across the world. Protocol reliability and artefact identification are increasingly seen as important aspects of super-resolution microscopy. The reliability of these approaches thus requires in-depth evaluation so that biological findings are based on solid foundations. Here we explore how different fixation approaches that disrupt or preserve the actin cytoskeleton affect membrane protein organization. Using CD4 as a model, we show that fixation-mediated disruption of the actin cytoskeleton correlates with changes in CD4 membrane organization. We highlight how these artefacts are easy to overlook and how careful sample preparation is essential for extracting meaningful results from super-resolution microscopy.
biorxiv cell-biology 100-200-users 2018The hippocampus is necessary for the sleep-dependent consolidation of a task that does not require the hippocampus for initial learning, bioRxiv, 2018-10-23
AbstractDuring sleep, the hippocampus plays an active role in consolidating memories that depend on it for initial encoding. There are hints in the literature that the hippocampus may have a broader influence, contributing to the consolidation of memories that may not initially require the area. We tested this possibility by evaluating learning and consolidation of the motor sequence task (MST) in hippocampal amnesics and demographically matched control participants. While the groups showed similar initial learning, only controls exhibited evidence of sleep-dependent consolidation. These results demonstrate that the hippocampus can be required for normal consolidation of a task without being required for its acquisition, suggesting that the area plays a broader role in coordinating sleep-dependent memory consolidation than has previously been assumed.
biorxiv neuroscience 100-200-users 2018