Assessing graph-based read mappers against a novel baseline approach highlights strengths and weaknesses of the current generation of methods, bioRxiv, 2019-02-02

AbstractGraph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions.We here assess three prominent graph-based read mappers against a novel hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve accuracy of read-mapping to graph-based reference genomes.Our method is implemented in a tool Two-step Graph Mapper, which is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comuio-bmitwo_step_graph_mapper>httpsgithub.comuio-bmitwo_step_graph_mapper<jatsext-link> along with data and scripts for reproducing the experiments.

biorxiv bioinformatics 0-100-users 2019

Islands of retroelements are the major components of Drosophila centromeres, bioRxiv, 2019-02-02

Centromeres are essential chromosomal regions that mediate kinetochore assembly and spindle attachments during cell division. Despite their functional conservation, centromeres are amongst the most rapidly evolving genomic regions and can shape karyotype evolution and speciation across taxa. Although significant progress has been made in identifying centromere-associated proteins, the highly repetitive centromeres of metazoans have been refractory to DNA sequencing and assembly, leaving large gaps in our understanding of their functional organization and evolution. Here, we identify the sequence composition and organization of the centromeres of Drosophila melanogaster by combining long-read sequencing, chromatin immunoprecipitation for the centromeric histone CENP-A, and high-resolution chromatin fiber imaging. Contrary to previous models that heralded satellite repeats as the major functional components, we demonstrate that functional centromeres form on islands of complex DNA sequences enriched in retroelements that are flanked by large arrays of satellite repeats. Each centromere displays distinct size and arrangement of its DNA elements but is similar in composition overall. We discover that a specific retroelement, G2Jockey-3, is the most highly enriched sequence in CENP-A chromatin and is the only element shared among all centromeres. G2Jockey-3 is also associated with CENP-A in the sister species Drosophila simulans, revealing an unexpected conservation despite the reported turnover of centromeric satellite DNA. Our work reveals the DNA sequence identity of the active centromeres of a premier model organism and implicates retroelements as conserved features of centromeric DNA.

biorxiv genomics 200-500-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo