SoupX removes ambient RNA contamination from droplet based single cell RNA sequencing data, bioRxiv, 2018-04-21
Droplet based single cell RNA sequence analyses assume all acquired RNAs are endogenous to cells. However, any cell free RNAs contained within the input solution are also captured by these assays. This sequencing of cell free RNA constitutes a background contamination that has the potential to confound the correct biological interpretation of single cell transcriptomic data. Here, we demonstrate that contamination from this soup of cell free RNAs is ubiquitous, experiment specific in its composition and magnitude, and can lead to erroneous biological conclusions. We present a method, SoupX, for quantifying the extent of the contamination and estimating background corrected, cell expression profiles that can be integrated with existing downstream analysis tools. We apply this method to two data-sets and show that the application of this method reduces batch effects, strengthens cell-specific quality control and improves biological interpretation.
biorxiv bioinformatics 100-200-users 2018Sub-2 Å Ewald Curvature Corrected Single-Particle Cryo-EM, bioRxiv, 2018-04-21
AbstractSingle-particle cryogenic electron microscopy (cryo-EM) provides a powerful methodology for structural biologists, but the resolutions typically attained with experimentally determined structures have lagged behind microscope capabilities. Here, we have exploited several technical solutions to improve resolution, including sub-Angstrom pixelation, per-particle CTF refinement, and most notably a correction for Ewald sphere curvature. The application of these methods on micrographs recorded on a base model Titan Krios enabled structure determination at ∼1.86-Å resolution of an adeno-associated virus serotype 2 variant (AAV2), an important gene-delivery vehicle.
biorxiv biophysics 100-200-users 2018STREAM Single-cell Trajectories Reconstruction, Exploration And Mapping of omics data, bioRxiv, 2018-04-18
AbstractSingle-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data.
biorxiv genomics 0-100-users 2018Single cell RNA-seq denoising using a deep count autoencoder, bioRxiv, 2018-04-14
AbstractSingle-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNAseq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured. Our method scales linearly with the number of cells and can therefore be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
biorxiv bioinformatics 200-500-users 2018Direct RNA Sequencing of the Complete Influenza A Virus Genome, bioRxiv, 2018-04-12
ABSTRACTFor the first time, a complete genome of an RNA virus has been sequenced in its original form. Previously, RNA was sequenced by the chemical degradation of radiolabelled RNA, a difficult method that produced only short sequences. Instead, RNA has usually been sequenced indirectly by copying it into cDNA, which is often amplified to dsDNA by PCR and subsequently analyzed using a variety of DNA sequencing methods. We designed an adapter to short highly conserved termini of the influenza virus genome to target the (-) sense RNA into a protein nanopore on the Oxford Nanopore MinION sequencing platform. Utilizing this method and total RNA extracted from the allantoic fluid of infected chicken eggs, we demonstrate successful sequencing of the complete influenza virus genome with 100% nucleotide coverage, 99% consensus identity, and 99% of reads mapped to influenza. By utilizing the same methodology we can redesign the adapter in order to expand the targets to include viral mRNA and (+) sense cRNA, which are essential to the viral life cycle. This has the potential to identify and quantify splice variants and base modifications, which are not practically measurable with current methods.
biorxiv genomics 100-200-users 2018Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats, bioRxiv, 2018-04-12
AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.
biorxiv genomics 0-100-users 2018