Single-cell isoform RNA sequencing (ScISOr-Seq) across thousands of cells reveals isoforms of cerebellar cell types, bioRxiv, 2018-07-09
AbstractFull-length isoform sequencing has advanced our knowledge of isoform biology1–11. However, apart from applying full-length isoform sequencing to very few single cells12,13, isoform sequencing has been limited to bulk tissue, cell lines, or sorted cells. Single splicing events have been described for <=200 single cells with great statistical success14,15, but these methods do not describe full-length mRNAs. Single cell short-read 3’ sequencing has allowed identification of many cell sub-types16–23, but full-length isoforms for these cell types have not been profiled. Using our new method of single-cell-isoform-RNA-sequencing (ScISOr-Seq) we determine isoform-expression in thousands of individual cells from a heterogeneous bulk tissue (cerebellum), without specific antibody-fluorescence activated cell sorting. We elucidate isoform usage in high-level cell types such as neurons, astrocytes and microglia and finer sub-types, such as Purkinje cells and Granule cells, including the combination patterns of distant splice sites6–9,24,25, which for individual molecules requires long reads. We produce an enhanced genome annotation revealing cell-type specific expression of known and 16,872 novel (with respect to mouse Gencode version 10) isoforms (see isoformatlas.com).ScISOr-Seq describes isoforms from >1,000 single cells from bulk tissue without cell sorting by leveraging two technologies in three steps In step one, we employ microfluidics to produce amplified full-length cDNAs barcoded for their cell of origin. This cDNA is split into two pools one pool for 3’ sequencing to measure gene expression (step 2) and another pool for long-read sequencing and isoform expression (step 3). In step two, short-read 3’-sequencing provides molecular counts for each gene and cell, which allows clustering cells and assigning a cell type using cell-type specific markers. In step three, an aliquot of the same cDNAs (each barcoded for the individual cell of origin) is sequenced using Pacific Biosciences (“PacBio”)1,2,4,5,26 or Oxford Nanopore3. Since these long reads carry the single-cell barcodes identified in step two, one can determine the individual cell from which each long read originates. Since most single cells are assigned to a named cluster, we can also assign the cell’s cluster name (e.g. “Purkinje cell” or “astrocyte”) to the long read in question (Fig 1A) – without losing the cell of origin of each long read.
biorxiv molecular-biology 100-200-users 2018Purification of Cross-linked RNA-Protein Complexes by Phenol-Toluol Extraction, bioRxiv, 2018-05-30
Recent methodological advances allowed the identification of an increasing number of RNA-binding proteins (RBPs) and their RNA-binding sites. Most of those methods rely, however, on capturing proteins associated to polyadenylated RNAs which neglects RBPs bound to non-adenylate RNA classes (tRNA, rRNA, pre-mRNA) as well as the vast majority of species that lack poly-A tails in their mRNAs (including all archea and bacteria). To overcome these limitations, we have developed a novel protocol, Phenol Toluol extraction (PTex), that does not rely on a specific RNA sequence or motif for isolation of cross-linked ribonucleoproteins (RNPs), but rather purifies them based entirely on their physicochemical properties. PTex captures RBPs that bind to RNA as short as 30 nt, RNPs directly from animal tissue and can be used to simplify complex workflows such as PAR-CLIP. Finally, we provide a first global RNA-bound proteome of human HEK293 cells and Salmonella Typhimurium as a bacterial species.
biorxiv molecular-biology 0-100-users 2018In vivo CRISPR-Cas gene editing with no detectable genome-wide off-target mutations, bioRxiv, 2018-02-28
CRISPR-Cas genome-editing nucleases hold substantial promise for human therapeutics1–5 but identifying unwanted off-target mutations remains an important requirement for clinical translation6, 7. For ex vivo therapeutic applications, previously published cell-based genome-wide methods provide potentially useful strategies to identify and quantify these off-target mutation sites8–12. However, a well-validated method that can reliably identify off-targets in vivo has not been described to date, leaving the question of whether and how frequently these types of mutations occur. Here we describe Verification of In Vivo Off-targets (VIVO), a highly sensitive, unbiased, and generalizable strategy that we show can robustly identify genome-wide CRISPR-Cas nuclease off-target effects in vivo. To our knowledge, these studies provide the first demonstration that CRISPR-Cas nucleases can induce substantial off-target mutations in vivo, a result we obtained using a deliberately promiscuous guide RNA (gRNA). More importantly, we used VIVO to show that appropriately designed gRNAs can direct efficient in vivo editing without inducing detectable off-target mutations. Our findings provide strong support for and should encourage further development of in vivo genome editing therapeutic strategies.
biorxiv molecular-biology 100-200-users 2018Precise temporal regulation of alternative splicing during neural development, bioRxiv, 2018-01-15
AbstractAlternative splicing (AS) is a crucial step of gene expression that must be tightly controlled, but the precise timing of dynamic splicing switches during neural development and the underlying regulatory mechanisms are poorly understood. Here we systematically analyzed the temporal regulation of AS in a large number of transcriptome profiles of developing mouse cortices, in vivo purified neuronal subtypes, and neurons differentiated in vitro. Our analysis revealed early- and late-switch exons in genes with distinct functions, and these switches accurately define neuronal maturation stages. Integrative modeling suggests that these switches are under direct and combinatorial regulation by distinct sets of neuronal RNA-binding proteins including Nova, Rbfox, Mbnl and Ptbp. Surprisingly, various neuronal subtypes in the sensory systems lack Nova andor Rbfox expression. These neurons retain the “immature” splicing program in early-switch exons, affecting numerous synaptic genes. These results provide new insights into the organization and regulation of the neurodevelopmental transcriptome.
biorxiv molecular-biology 0-100-users 2018Long-read sequencing of nascent RNA reveals coupling among RNA processing events, bioRxiv, 2017-12-19
AbstractPre-mRNA splicing is accomplished by the spliceosome, a megadalton complex that assembles de novo on each intron. Because spliceosome assembly and catalysis occur co-transcriptionally, we hypothesized that introns are removed in the order of their transcription in genomes dominated by constitutive splicing. Remarkably little is known about splicing order and the regulatory potential of nascent transcript remodeling by splicing, due to the limitations of existing methods that focus on analysis of mature splicing products (mRNAs) rather than substrates and intermediates. Here, we overcome this obstacle through long-read RNA sequencing of nascent, multi-intron transcripts in the fission yeast Schizosaccharomyces pombe. Most multi-intron transcripts were fully spliced, consistent with rapid co-transcriptional splicing. However, an unexpectedly high proportion of transcripts were either fully spliced or fully unspliced, suggesting that splicing of any given intron is dependent on the splicing status of other introns in the transcript. Supporting this, mild inhibition of splicing by a temperature-sensitive mutation in Prp2, the homolog of vertebrate U2AF65, increased the frequency of fully unspliced transcripts. Importantly, fully unspliced transcripts displayed transcriptional read-through at the polyA site and were degraded co-transcriptionally by the nuclear exosome. Finally, we show that cellular mRNA levels were reduced in genes with a high number of unspliced nascent transcripts during caffeine treatment, showing regulatory significance of co-transcriptional splicing. Therefore, overall splicing of individual nascent transcripts, 3’ end formation, and mRNA half-life depend on the splicing status of neighboring introns, suggesting crosstalk among spliceosomes and the polyA cleavage machinery during transcription elongation.
biorxiv molecular-biology 0-100-users 2017Isolation of nucleic acids from low biomass samples detection and removal of sRNA contaminants, bioRxiv, 2017-12-15
ABSTRACTBackgroundSequencing-based analyses of low-biomass samples are known to be prone to misinterpretation due to the potential presence of contaminating molecules derived from laboratory reagents and environments. Due to its inherent instability, contamination with RNA is usually considered to be unlikely.ResultsHere we report the presence of small RNA (sRNA) contaminants in widely used microRNA extraction kits and means for their depletion. Sequencing of sRNAs extracted from human plasma samples was performed and significant levels of non-human (exogenous) sequences were detected. The source of the most abundant of these sequences could be traced to the microRNA extraction columns by qPCR-based analysis of laboratory reagents. The presence of artefactual sequences originating from the confirmed contaminants were furthermore replicated in a range of published datasets. To avoid artefacts in future experiments, several protocols for the removal of the contaminants were elaborated, minimal amounts of starting material for artefact-free analyses were defined, and the reduction of contaminant levels for identification of bona fide sequences using ‘ultraclean’ extraction kits was confirmed.ConclusionThis is the first report of the presence of RNA molecules as contaminants in laboratory reagents. The described protocols should be applied in the future to avoid confounding sRNA studies.
biorxiv molecular-biology 100-200-users 2017