Assessing graph-based read mappers against a novel baseline approach highlights strengths and weaknesses of the current generation of methods, bioRxiv, 2019-02-02
AbstractGraph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions.We here assess three prominent graph-based read mappers against a novel hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve accuracy of read-mapping to graph-based reference genomes.Our method is implemented in a tool Two-step Graph Mapper, which is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comuio-bmitwo_step_graph_mapper>httpsgithub.comuio-bmitwo_step_graph_mapper<jatsext-link> along with data and scripts for reproducing the experiments.
biorxiv bioinformatics 0-100-users 2019Delivering genes across the blood-brain barrier LY6A, a novel cellular receptor for AAV-PHP.B capsids, bioRxiv, 2019-02-02
The engineered AAV-PHP.B family of adeno-associated virus efficiently delivers genes throughout the mouse central nervous system. To guide their application across disease models, and to inspire the development of translational gene therapy vectors useful for targeting neurological diseases in humans, we sought to elucidate the host factors responsible for the CNS tropism of AAV-PHP.B vectors. Leveraging CNS tropism differences across mouse strains, we conducted a genome-wide association study, and rapidly identified and verified LY6A as an essential receptor for the AAV-PHP.B vectors in brain endothelial cells. Importantly, this newly discovered mode of AAV binding and transduction is independent of other known AAV receptors and can be imported into different cell types to confer enhanced transduction by the AAV-PHP.B vectors.
biorxiv neuroscience 0-100-users 2019Islands of retroelements are the major components of Drosophila centromeres, bioRxiv, 2019-02-02
Centromeres are essential chromosomal regions that mediate kinetochore assembly and spindle attachments during cell division. Despite their functional conservation, centromeres are amongst the most rapidly evolving genomic regions and can shape karyotype evolution and speciation across taxa. Although significant progress has been made in identifying centromere-associated proteins, the highly repetitive centromeres of metazoans have been refractory to DNA sequencing and assembly, leaving large gaps in our understanding of their functional organization and evolution. Here, we identify the sequence composition and organization of the centromeres of Drosophila melanogaster by combining long-read sequencing, chromatin immunoprecipitation for the centromeric histone CENP-A, and high-resolution chromatin fiber imaging. Contrary to previous models that heralded satellite repeats as the major functional components, we demonstrate that functional centromeres form on islands of complex DNA sequences enriched in retroelements that are flanked by large arrays of satellite repeats. Each centromere displays distinct size and arrangement of its DNA elements but is similar in composition overall. We discover that a specific retroelement, G2Jockey-3, is the most highly enriched sequence in CENP-A chromatin and is the only element shared among all centromeres. G2Jockey-3 is also associated with CENP-A in the sister species Drosophila simulans, revealing an unexpected conservation despite the reported turnover of centromeric satellite DNA. Our work reveals the DNA sequence identity of the active centromeres of a premier model organism and implicates retroelements as conserved features of centromeric DNA.
biorxiv genomics 200-500-users 2019Millefy visualizing cell-to-cell heterogeneity in read coverage of single-cell RNA sequencing datasets, bioRxiv, 2019-02-02
Background Read coverage of RNA sequencing data reflects gene expression and RNA processing events. Single-cell RNA sequencing (scRNA-seq) methods, particularly full-length ones, provide read coverage of many individual cells and have the potential to reveal cellular heterogeneity in RNA transcription and processing. However, visualization tools suited to highlighting cell-to-cell heterogeneity in read coverage are still lacking.Results Here, we have developed Millefy, a tool for visualizing read coverage of scRNA-seq data in genomic contexts. Millefy is designed to show read coverage of all individual cells at once in genomic contexts and to highlight cell-to-cell heterogeneity in read coverage. By visualizing read coverage of all cells as a heat map and dynamically reordering cells based on diffusion maps, Millefy facilitates discovery of local region-specific, cell-to-cell heterogeneity in read coverage, including variability of transcribed regions. Conclusions Millefy simplifies the examination of cellular heterogeneity in RNA transcription and processing events using scRNA-seq data. Millefy is available as an R package (httpsgithub.comyuifumillefy) and a Docker image to help use Millefy on the Jupyter notebook (httpshub.docker.comryuifudatascience-notebook-millefy).
biorxiv bioinformatics 0-100-users 2019Reconstruction of 1,000 projection neurons reveals new cell types and organization of long-range connectivity in the mouse brain, bioRxiv, 2019-02-02
Neuronal cell types are the nodes of neural circuits that determine the flow of information within the brain. Neuronal morphology, especially the shape of the axonal arbor, provides an essential descriptor of cell type and reveals how individual neurons route their output across the brain. Despite the importance of morphology, few projection neurons in the mouse brain have been reconstructed in their entirety. Here we present a robust and efficient platform for imaging and reconstructing complete neuronal morphologies, including axonal arbors that span substantial portions of the brain. We used this platform to reconstruct more than 1,000 projection neurons in the motor cortex, thalamus, subiculum, and hypothalamus. Together, the reconstructed neurons comprise more than 75 meters of axonal length and are available in a searchable online database. Axonal shapes revealed previously unknown subtypes of projection neurons and suggest organizational principles of long-range connectivity.
biorxiv neuroscience 200-500-users 2019Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells, bioRxiv, 2019-02-02
In situ assays of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.
biorxiv genomics 200-500-users 2019