Spliceosome profiling visualizes the operations of a dynamic RNP in vivo at nucleotide resolution, bioRxiv, 2017-11-23
SummaryTools to understand how the spliceosome functions in vivo have lagged behind advances in its structural biology. We describe methods to globally profile spliceosome-bound precursor, intermediates and products at nucleotide resolution. We apply these tools to three divergent yeast species that span 600 million years of evolution. The sensitivity of the approach enables detection of novel cases of non-canonical catalysis including interrupted, recursive and nested splicing. Employing statistical modeling to understand the quantitative relationships between RNA features and the data, we uncover independent roles for intron size, position and number in substrate progression through the two catalytic stages. These include species-specific inputs suggestive of spliceosome-transcriptome coevolution. Further investigations reveal ATP-dependent discard of numerous endogenous substrates at both the precursor and lariat-intermediate stages and connect discard to intron retention, a form of splicing regulation. Spliceosome profiling is a quantitative, generalizable global technology to investigate an RNP central to eukaryotic gene expression.Highlights<jatslist list-type=bullet><jatslist-item>Measurement of spliceosome-bound precursor and intermediate in three species<jatslist-item><jatslist-item>Non-canonical splicing events revealed<jatslist-item><jatslist-item>Statistical modeling uncovers substrate features that predict catalytic efficiency<jatslist-item><jatslist-item>Discard of suboptimal substrates occurs in vivo and predicts intron-retained mRNAs<jatslist-item>
biorxiv molecular-biology 0-100-users 2017Design and specificity of long ssDNA donors for CRISPR-based knock-in, bioRxiv, 2017-08-22
Update November 12th, 2019. The conclusions of this pre-print are outdated. See Authors note on page 2. CRISPRCas technologies have transformed our ability to manipulate genomes for research and gene-based therapy. In particular, homology-directed repair after genomic cleavage allows for precise modification of genes using exogenous donor sequences as templates. While both single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) forms of donors have been used as repair templates, a systematic comparison of the performance and specificity of repair using ssDNA versus dsDNA donors is still lacking. Here, we describe an optimized method for the synthesis of long ssDNA templates and demonstrate that ssDNA donors can drive efficient integration of gene-sized reporters in human cell lines. We next define a set of rules to maximize the efficiency of ssDNA-mediated knock-in by optimizing donor design. Finally, by comparing ssDNA donors with equivalent dsDNA sequences (PCR products or plasmids), we demonstrate that ssDNA templates have a unique advantage in terms of repair specificity while dsDNA donors can lead to a high rate of off-target integration. Our results provide a framework for designing high-fidelity CRISPR-based knock-in experiments, in both research and therapeutic settings.
biorxiv molecular-biology 0-100-users 2017“Unexpected mutations after CRISPR-Cas9 editing in vivo” are most likely pre-existing sequence variants and not nuclease-induced mutations, bioRxiv, 2017-07-06
Schaefer et al. recently advanced the provocative conclusion that CRISPR-Cas9 nuclease can induce off-target alterations at genomic loci that do not resemble the intended on-target site.1 Using high-coverage whole genome sequencing (WGS), these authors reported finding SNPs and indels in two CRISPR-Cas9-treated mice that were not present in a single untreated control mouse. On the basis of this association, Schaefer et al. concluded that these sequence variants were caused by CRISPR-Cas9. This new proposed CRISPR-Cas9 off-target activity runs contrary to previously published work2–8 and, if the authors are correct, could have profound implications for research and therapeutic applications. Here, we demonstrate that the simplest interpretation of Schaefer et al.’s data is that the two CRISPR-Cas9-treated mice are actually more closely related genetically to each other than to the control mouse. This strongly suggests that the so-called “unexpected mutations” simply represent SNPs and indels shared in common by these mice prior to nuclease treatment. In addition, given the genomic and sequence distribution profiles of these variants, we show that it is challenging to explain how CRISPR-Cas9 might be expected to induce such changes. Finally, we argue that the lack of appropriate controls in Schaefer et al.’s experimental design precludes assignment of causality to CRISPR-Cas9. Given these substantial issues, we urge Schaefer et al. to revise or re-state the original conclusions of their published work so as to avoid leaving misleading and unsupported statements to persist in the literature.
biorxiv molecular-biology 100-200-users 2017Epigenetic maintenance of DNA methylation after evolutionary loss of the de novo methyltransferase, bioRxiv, 2017-06-14
ABSTRACTAfter the initial establishment of symmetric cytosine methylation patterns by de novo DNA methyltransferases (DNMTs), maintenance DNMTs mediate epigenetic memory by propagating the initial signal. We find that CG methylation in the yeast Cryptococcus neoformans is dependent on a purely epigenetic mechanism mediated by the single DNMT encoded by the genome, Dnmt5. Purified Dnmt5 is a maintenance methyltransferase that strictly requires a hemimethylated substrate, and methylation lost by removal of Dnmt5 in vivo is not restored by its mitotic or meiotic reintroduction. Phylogenetic analysis reveals that the ancestral species had a second methyltransferase, DnmtX, whose gene was lost between 50 and 150 Mya. Expression of extant DnmtXs in C. neoformans triggers de novo methylation. These data indicate that DNA methylation has been maintained epigenetically by the Dnmt5 system since the ancient loss of the DnmtX de novo enzyme, implying remarkably long-lived epigenetic memory.Single sentence summaryEpigenetic information can be inherited over geological timescales
biorxiv molecular-biology 100-200-users 2017Evolutionary persistence of DNA methylation for millions of years after ancient loss of a de novo methyltransferase, bioRxiv, 2017-06-14
SUMMARYCytosine methylation of DNA is a widespread modification of DNA that plays numerous critical roles, yet has been lost many times in diverse eukaryotic lineages. In the yeast Cryptococcus neoformans, CG methylation occurs in transposon-rich repeats and requires the DNA methyltransferase, Dnmt5. We show that Dnmt5 displays exquisite maintenance-type specificity in vitro and in vivo and utilizes similar in vivo cofactors as the metazoan maintenance methylase Dnmt1. Remarkably, phylogenetic and functional analysis revealed that the ancestral species lost the gene for a de novo methylase, DnmtX, between 50-150 MYA. We examined how methylation has persisted since the ancient loss of DnmtX. Experimental and comparative studies reveal efficient replication of methylation patterns in C. neoformans, rare stochastic methylation loss and gain events, and the action of natural selection. We propose that an epigenome has been propagated for >50 MY through a process analogous to Darwinian evolution of the genome.
biorxiv molecular-biology 200-500-users 2017Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing, bioRxiv, 2017-04-10
AbstractIllumina-based next generation sequencing (NGS) has accelerated biomedical discovery through its ability to generate thousands of gigabases of sequencing output per run at a fraction of the time and cost of conventional technologies. The process typically involves four basic steps library preparation, cluster generation, sequencing, and data analysis. In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines (HiSeq 30004000X Ten) called exclusion amplification (ExAmp), which was a fundamental shift from the earlier method of random cluster generation by bridge amplification on a non-patterned flow cell. The ExAmp chemistry, in conjunction with patterned flow cells containing nanowells at fixed locations, increases cluster density on the flow cell, thereby reducing the cost per run. It also increases sequence read quality, especially for longer read lengths (up to 150 base pairs). This advance has been widely adopted for genome sequencing because greater sequencing depth can be achieved for lower cost without compromising the quality of longer reads. We show that this promising chemistry is problematic, however, when multiplexing samples. We discovered that up to 5-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this “spreading-of-signals” arises from low levels of free index primers present in the pool. These index primers can prime pooled library fragments at random via complementary 3’ ends, and get extended by DNA polymerase, creating a new library molecule with a new index before binding to the patterned flow cell to generate a cluster for sequencing. This causes the resulting read from that cluster to be assigned to a different sample, causing the spread of signals within multiplexed samples. We show that low levels of free index primers persist after the most common library purification procedure recommended by Illumina, and that the amount of signal spreading among samples is proportional to the level of free index primer present in the library pool. This artifact causes homogenization and misclassification of cells in single cell RNA-seq experiments. Therefore, all data generated in this way must now be carefully re-examined to ensure that “spreading-of-signals” has not compromised data analysis and conclusions. Re-sequencing samples using an older technology that uses conventional bridge amplification for cluster generation, or improved library cleanup strategies to remove free index primers, can minimize or eliminate this signal spreading artifact.
biorxiv molecular-biology 500+-users 2017