Bazam A rapid method for read extraction and realignment of high throughput sequencing data, bioRxiv, 2018-10-04
AbstractBackgroundAs costs of high throughput sequencing have fallen, we are seeing vast quantities of short read genomic data being generated. Often, the data is exchanged and stored as aligned reads, which provides high compression and convenient access for many analyses. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Moreover, some applications cannot utilise pre-aligned reads at all, necessitating conversion back to raw format (FASTQ) before they can be used. In both cases, the process of extraction and realignment is expensive and time consuming.FindingsWe describe Bazam, a tool that efficiently extracts the original paired FASTQ from reads stored in aligned form (BAM or CRAM format). Bazam extracts reads in a format that directly allows realignment with popular aligners with high concurrency. Through eliminating steps and increasing the accessible concurrency, Bazam facilitates up to a 90% reduction in the time required for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions, further increasing efficiency for targeted analyses. Bazam is additionally suitable as a base for other applications that require efficient paired read information, such as quality control, structural variant calling and alignment comparison.ConclusionsBazam offers significant improvements for users needing to realign genomic data.
biorxiv bioinformatics 200-500-users 2018High-performance GFP-based calcium indicators for imaging activity in neuronal populations and microcompartments, bioRxiv, 2018-10-04
AbstractCalcium imaging with genetically encoded calcium indicators (GECIs) is routinely used to measure neural activity in intact nervous systems. GECIs are frequently used in one of two different modes to track activity in large populations of neuronal cell bodies, or to follow dynamics in subcellular compartments such as axons, dendrites and individual synaptic compartments. Despite major advances, calcium imaging is still limited by the biophysical properties of existing GECIs, including affinity, signal-to-noise ratio, rise and decay kinetics, and dynamic range. Using structure-guided mutagenesis and neuron-based screening, we optimized the green fluorescent protein-based GECI GCaMP6 for different modes of in vivo imaging. The jGCaMP7 sensors provide improved detection of individual spikes (jGCaMP7s,f), imaging in neurites and neuropil (jGCaMP7b), and tracking large populations of neurons using 2-photon (jGCaMP7s,f) or wide-field (jGCaMP7c) imaging.
biorxiv neuroscience 200-500-users 2018Microscopy-based chromosome conformation capture enables simultaneous visualization of genome organization and transcription in intact organisms, bioRxiv, 2018-10-04
Eukaryotic chromosomes are organized in multiple scales, from nucleosomes to chromosome territories. Recently, genome-wide methods identified an intermediate level of chromosome organization, topologically associating domains (TADs), that play key roles in transcriptional regulation. However, these methods cannot directly examine the interplay between transcriptional activation and chromosome architecture while maintaining spatial information. Here, we present a multiplexed, sequential imaging approach (Hi-M) that permits the simultaneous detection of chromosome organization and transcription in single nuclei. This allowed us to unveil the changes in 3D chromatin organization occurring upon transcriptional activation and homologous chromosome un-pairing during the awakening of the zygotic genome in intact Drosophila embryos. Excitingly, the ability of Hi-M to explore the multi-scale chromosome architecture with spatial resolution at different stages of development or during the cell cycle will be key to understand the mechanisms and consequences of the 4D organization of the genome.
biorxiv cell-biology 100-200-users 2018scRNA-seq mixology towards better benchmarking of single cell RNA-seq analysis methods, bioRxiv, 2018-10-04
AbstractSingle cell RNA sequencing (scRNA-seq) technology has undergone rapid development in recent years, bringing with new challenges in data processing and analysis. This has led to an explosion of tailored analysis methods for scRNA-seq data to address various biological questions. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically evaluate the performance of the many methods available. Here, we designed and carried out a realistic benchmark experiment that included mixtures of single cells or ‘pseudo cells’ created by sampling admixtures of cells or RNA from up to 5 distinct cancer cell lines. Altogether we generated 14 datasets using droplet and plate-based scRNA-seq protocols, compared multiple data analysis methods in combination for tasks ranging from normalization and imputation, to clustering, trajectory analysis and data integration. Evaluation across 3,913 analyses (methods × benchmark dataset combinations) revealed pipelines suited to different types of data for different tasks. Our dataset and analysis present a comprehensive comparison framework for benchmarking most common scRNA-seq analysis tasks.
biorxiv bioinformatics 100-200-users 2018Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome, bioRxiv, 2018-10-04
AbstractWe sequenced the Yoruban NA19240 genome on the long read sequencing platform Oxford Nanopore PromethION for benchmarking and evaluation of recently published aligners and structural variant calling tools. In this work, we determined the precision and recall, present high confidence and high sensitivity call sets of variants and discuss optimal parameters. The aligner Minimap2 and structural variant caller Sniffles are both the most accurate and the most computationally efficient tools in our study. We describe our scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long read genome sequencing of an individual or population. By discussing the results of this genome we provide an approximation of what can be expected in future long read sequencing studies aiming for structural variant identification.
biorxiv bioinformatics 0-100-users 2018GeneWeld a method for efficient targeted integration directed by short homology, bioRxiv, 2018-10-03
AbstractChoices for genome engineering and integration involve high efficiency with little or no target specificity or high specificity with low activity. Here, we describe a targeted integration strategy, called GeneWeld, and a vector series for gene tagging, pGTag (plasmids for Gene Tagging), which promote highly efficient and precise targeted integration in zebrafish embryos, pig fibroblasts, and human cells utilizing the CRISPRCas9 system. Our work demonstrates that in vivo targeting of a genomic locus of interest with CRISPRCas9 and a donor vector containing as little as 24 to 48 base pairs of homology directs precise and efficient knock-in when the homology arms are exposed with a double strand break in vivo. Our results suggest that the length of homology is not important in the design of knock-in vectors but rather how the homology is presented to a double strand break in the genome. Given our results targeting multiple loci in different species, we expect the accompanying protocols, vectors, and web interface for homology arm design to help streamline gene targeting and applications in CRISPR and TALEN compatible systems.
biorxiv genetics 0-100-users 2018