A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines Library preparation and normalisation methods have the biggest impact on the performance of scRNA-seq studies, bioRxiv, 2019-03-20
AbstractThe recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~ 3,000 pipelines, allowing us to also assess interactions among pipeline steps.We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.
biorxiv bioinformatics 100-200-users 2019CRISPRCas9-based mutagenesis frequently provokes on-target mRNA misregulation, bioRxiv, 2019-03-20
The introduction of insertion-deletions (INDELs) by activation of the error-prone non-homologous end-joining (NHEJ) pathway underlies the mechanistic basis of CRISPRCas9-directed genome editing. The ability of CRISPRCas9 to achieve gene elimination (knockouts) is largely attributed to the emergence of a pre-mature termination codon (PTC) from a frameshift-inducing INDEL that elicits non-sense mediated decay (NMD) of the mutant mRNA. Yet, the impact on gene expression as a consequence of CRISPRCas9-introduced INDELs into RNA regulatory sequences has been largely left uninvestigated. By tracking DNA-mRNA-protein relationships in a collection of CRISPRCas9-edited cell lines that harbor frameshift-inducing INDELs in various targeted genes, we detected the production of foreign mRNAs or proteins in ∼50% of the cell lines. We demonstrate that these aberrant protein products are derived from the introduction of INDELs that promote internal ribosomal entry, convert pseudo-mRNAs into protein encoding molecules, or induce exon skipping by disruption of exon splicing enhancers (ESEs). Our results using CRISPRCas9-introduced INDELs reveal facets of an epigenetic genome buffering apparatus that likely evolved to mitigate the impact of such mutations introduced by pathogens and aberrant DNA damage repair, and that more recently pose challenges to manipulating gene expression outcomes using INDEL-based mutagenesis.
biorxiv molecular-biology 100-200-users 2019Nuclear pores as versatile reference standards for quantitative superresolution microscopy, bioRxiv, 2019-03-20
AbstractQuantitative fluorescence and superresolution microscopy are often limited by insufficient data quality or artifacts. In this context, it is essential to have biologically relevant control samples to benchmark and optimize the quality of microscopes, labels and imaging conditions.Here we exploit the stereotypic arrangement of proteins in the nuclear pore complex as in situ reference structures to characterize the performance of a variety of microscopy modalities. We created four genome edited cell lines in which we endogenously labeled the nucleoporin Nup96 with mEGFP, SNAP-tag or HaloTag or the photoconvertible fluorescent protein mMaple. We demonstrate their use a) as 3D resolution standards for calibration and quality control, b) to quantify absolute labeling efficiencies and c) as precise reference standards for molecular counting.These cell lines will enable the broad community to assess the quality of their microscopes and labels, and to perform quantitative, absolute measurements.
biorxiv biophysics 100-200-users 2019Determinants of transcription factor regulatory range, bioRxiv, 2019-03-19
AbstractTo characterize the genomic distances over which transcription factors (TFs) influence gene expression, we examined thousands of TF and histone modification ChIP-seq datasets and thousands of gene expression profiles. A model integrating these data revealed two classes of TF one with short-range regulatory influence, the other with long-range regulatory influence. The two TF classes also had distinct chromatin-binding preferences and auto-regulatory properties. The regulatory range of a single TF bound within different topologically associating domains (TADs) depended on intrinsic TAD properties such as local gene density and GC content, but also on the TAD chromatin state in specific cell types. Our results provide evidence that most TFs belong to one of these two functional classes, and that the regulatory range of long-range TFs is chromatin-state dependent. Thus, consideration of TF type, distance-to-target, and chromatin context is likely important in identifying TF regulatory targets and interpreting GWAS and eQTL SNPs.
biorxiv genomics 100-200-users 2019Ribosome profiling at isoform level reveals an evolutionary conserved impact of differential splicing on the proteome, bioRxiv, 2019-03-19
AbstractThe differential production of transcript isoforms from gene loci is a key cellular mechanism. Yet, its impact in protein production remains an open question. Here, we describe ORQAS (ORF quantification pipeline for alternative splicing) a new pipeline for the translation quantification of individual transcript isoforms using ribosome-protected mRNA fragments (Ribosome profiling). We found evidence of translation for 40-50% of the expressed transcript isoforms in human and mouse, with 53% of the expressed genes having more than one translated isoform in human, 33% in mouse. Differential analysis revealed that about 40% of the splicing changes at RNA level were concordant with changes in translation, with 21.7% of changes at RNA level and 17.8% at translational level conserved between human and mouse. Furthermore, orthologous cassette exons preserving the directionality of the change were found enriched in microexons in a comparison between glia and glioma, and were conserved between human and mouse. ORQAS leverages ribosome profiling to uncover a widespread and evolutionary conserved impact of differential splicing on the translation of isoforms and in particular, of microexon-containing ones. ORQAS is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comcomprnaorqas>httpsgithub.comcomprnaorqas<jatsext-link>
biorxiv genomics 100-200-users 2019A deep learning framework for nucleus segmentation using image style transfer, bioRxiv, 2019-03-18
AbstractSingle cell segmentation is typically one of the first and most crucial tasks of image-based cellular analysis. We present a deep learning approach aiming towards a truly general method for localizing nuclei across a diverse range of assays and light microscopy modalities. We outperform the 739 methods submitted to the 2018 Data Science Bowl on images representing a variety of realistic conditions, some of which were not represented in the training data. The key to our approach is to adapt our model to unseen and unlabeled data using image style transfer to generate augmented training samples. This allows the model to recognize nuclei in new and different experiments without requiring expert annotations.
biorxiv bioinformatics 100-200-users 2019