Ultra-deep, long-read nanopore sequencing of mock microbial community standards, bioRxiv, 2018-12-04
Background Long sequencing reads are information-rich aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.Methods We sequenced two commercially-available mock communities containing ten microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Isolates from the same mock community were sequenced individually with Illumina HiSeq.Data We generated 14 and 16 Gbp from GridION flowcells and 146 and 148 Gbp from PromethION flowcells for the even and odd communities respectively. Read length N50 was 5.3 Kbp and 5.2 Kbp for the even and log community, respectively. Basecalls and corresponding signal data are made available (4.2 TB in total). Results Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). De novo assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.Conclusions We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.
biorxiv bioinformatics 100-200-users 2018Direct RNA nanopore sequencing of full-length coron-avirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
ABSTRACTSequence analyses of RNA virus genomes remain challenging due to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses that are generally referred to as ‘quasispecies’. Although standard (short-read) sequencing technologies allow to readily determine consensus sequences for these ‘quasispecies’, it is far more difficult to reconstruct large numbers of full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length (sg) RNAs comprised of noncontiguous genome regions that may be present in these virus populations. Here, we used a full-length, direct RNA sequencing (DRS) approach without any amplification step to characterize viral RNAs produced in cells infected with a human coronavirus representing one of the largest RNA virus genomes known to date.Using DRS, we were able to map the longest (~26 kb) contiguous read to the viral reference genome. By combining Illumina and nanopore sequencing, a highly accurate consensus sequence of the human coronavirus (HCoV) 229E genome (27.3 kb) was reconstructed. Furthermore, using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which does not require reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs. Our work paves the way for haplotype-based analyses of viral quasispecies by demonstrating the feasibility of intra-sample haplotype separation. We also show how supplementary short-read sequencing (Illumina) can be used to reduce the error rate of nanopore sequencing.Even though a number of technical challenges remain to be addressed to fully exploit the potential of the nanopore technology, our work illustrates that direct RNA sequencing may significantly advance genomic studies of complex virus populations, including predictions on long-range interactions in individual full-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
Sequence analyses of RNA virus genomes remain challenging due to the exceptionalgenetic plasticity of these viruses. Because of high mutation and recombinationrates, genome replication by viral RNA-dependent RNA polymerases leads topopulations of closely related viruses, so-called 'quasispecies'. Standard(short-read) sequencing technologies are ill-suited to reconstruct large numbersof full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length(sg) RNAs comprised of noncontiguous genome regions. Here, we used afull-length, direct RNA sequencing (DRS) approach based on nanopores tocharacterize viral RNAs produced in cells infected with a human coronavirus.Using DRS, we were able to map the longest (~26 kb) contiguous read to theviral reference genome. By combining Illumina and nanopore sequencing, wereconstructed a highly accurate consensus sequence of the human coronavirus(HCoV) 229E genome (27.3 kb). Furthermore, using long reads that did notrequire an assembly step, we were able to identify, in infected cells, diverseand novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRSapproach, which circumvents reverse transcription and amplification of RNA,allowed us to detect methylation sites in viral RNAs. Our work paves the way forhaplotype-based analyses of viral quasispecies by demonstrating the feasibilityof intra-sample haplotype separation.Even though several technical challenges remain to be addressed to exploit thepotential of the nanopore technology fully, our work illustrates that direct RNAsequencing may significantly advance genomic studies of complex viruspopulations, including predictions on long-range interactions in individualfull-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Linked-read sequencing of gametes allows efficient genome-wide analysis of meiotic recombination, bioRxiv, 2018-12-01
ABSTRACTMeiotic crossovers (COs) ensure proper chromosome segregation and redistribute the genetic variation that is transmitted to the next generation. Existing methods for CO identification are challenged by large populations and the demand for genome-wide and fine-scale resolution. Taking advantage of linked-read sequencing, we developed a highly efficient method for genome-wide identification of COs at kilobase resolution in pooled recombinants. We first tested this method using a pool of Arabidopsis F2 recombinants, and obtained results that recapitulated those identified from the same plants using individual whole-genome sequencing. By applying this method to a pool of pollen DNA from a single F1 plant, we established a highly accurate CO landscape without generating or sequencing a single recombinant plant. The simplicity of this approach now enables the simultaneous generation and analysis of multiple CO landscapes and thereby allows for efficient comparison of genotypic and environmental effects on recombination, accelerating the pace at which the mechanisms for the regulation of recombination can be elucidated.
biorxiv bioinformatics 100-200-users 2018E-ChRPs Engineered Chromatin Remodeling Proteins for Precise Nucleosome Positioning, bioRxiv, 2018-11-29
SummaryRegulation of chromatin structure is essential for controlling the access of DNA to factors that require association with specific DNA sequences. The ability to alter chromatin organization in a targeted manner would provide a mechanism for directly manipulating DNA-dependent processes and should provide a means to study direct consequences of chromatin structural changes. Here we describe the development and validation of engineered chromatin remodeling proteins (E-ChRPs) for inducing programmable changes in nucleosome positioning by design. We demonstrate that E-ChRPs function both in vivo and in vitro to specifically reposition target nucleosomes and entire nucleosomal arrays, and possess the ability to evict native DNA-binding proteins through their action. E-ChRPs can be designed with a range of targeting modalities, including the SpyCatcher and dCas9 moieties, resulting in high versatility and enabling diverse future applications. Thus, engineered chromatin remodeling proteins represent a simple and robust means to probe regulation of DNA-dependent processes in different chromatin contexts.
biorxiv genetics 100-200-users 2018Base editing generates substantial off-target single nucleotide variants, bioRxiv, 2018-11-27
AbstractGenome editing tools including CRISPRCas9 and base editors hold great promise for correcting pathogenic mutations. Unbiased genome-wide off-target effects of the editing in mammalian cells is required before clinical applications, but determination of the extent of off-target effects has been difficult due to the existence of single nucleotide polymorphisms (SNPs) in individuals. Here, we developed a method named GOTI (Genome-wide Off-target analysis by Two-cell embryo Injection) to detect off-target mutations without interference of SNPs. We applied GOTI to both the CRISPR-Cas9 and base editing (BE3) systems by editing one blastomere of the two-cell mouse embryo and then compared whole genome sequences of progeny-cell populations at E14.5 stage. Sequence analysis of edited and non-edited cell progenies showed that undesired off-target single nucleotide variants (SNVs) are rare (average 10.5) in CRISPR-edited mouse embryos, with a frequency close to the spontaneous mutation rate. By contrast, BE3 editing induced over 20-fold higher SNVs (average 283), raising the concern of using base-editing approaches for biomedical application.
biorxiv genomics 100-200-users 2018