chromoMap An R package for Interactive Visualization and Annotation of Chromosomes, bioRxiv, 2019-04-11

AbstractSummarychromoMap is an R package for constructing interactive visualizations of chromosomeschromosomal regions, and mapping of chromosomal elements (like genes) onto them, of any living organism. The package takes separate tab-delimited files (BED like) to specify the genomic co-ordinates of the chromosomes and the elements to annotate. Each rendered chromosome is composed of continuous loci of specific ranges where each locus, on hover, displays detailed information about the elements annotated within that locus range. By just tweaking parameters of a single function, users can generate a variety of plots that can either be saved as static image or shared as HTML documents. Users can utilize the various prominent features of chromoMap including, but not limited to, visualizing polyploidy, creating chromosome heatmaps, mapping groups of elements, adding hyperlinks to elements, multi-species chromosome visualization.Availability and implementationThe R package chromoMap is available under the GPL-3 Open Source license. It is included with a vignette for comprehensive understanding of its various features, and is freely available from <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsCRAN.R-project.orgpackage=chromoMap>httpsCRAN.R-project.orgpackage=chromoMap<jatsext-link>.Contactlakshayanand15@gmail.com<jatssec sec-type=supplementary-material>Supplementary informationSupplementary data are available online.

biorxiv bioinformatics 100-200-users 2019

Pooled-parent exome sequencing to prioritise de novo variants in genetic disease, bioRxiv, 2019-04-07

AbstractIn the clinical setting, exome sequencing has become standard-of-care in diagnosing rare genetic disorders, however many patients remain unsolved. Trio sequencing has been demonstrated to produce a higher diagnostic yield than singleton (proband-only) sequencing. Parental sequencing is especially useful when a disease is suspected to be caused by a de novo variant in the proband, because parental data provide a strong filter for the majority of variants that are shared by the proband and their parents. However the additional cost of sequencing the parents makes the trio strategy uneconomical for many clinical situations. With two thirds of the sequencing budget being spent on parents, these are funds that could be used to sequence more probands. For this reason many clinics are reluctant to sequence parents.Here we propose a pooled-parent strategy for exome sequencing of individuals with likely de novo disease. In this strategy, DNA from all the parents of a cohort of unrelated probands is pooled together into a single exome capture and sequencing run. Variants called in the proband can then be filtered if they are also found in the parent pool, resulting in a shorter list of prioritised variants. To evaluate the pooled-parent strategy we performed a series of simulations by combining reads from individual exomes to imitate sample pooling. We assessed the recall and false positive rate and investigated the trade-off between pool size and recall rate. We compared the performance of GATK HaplotypeCaller individual and joint calling, and FreeBayes to genotype pooled samples. Finally, we applied a pooled-parent strategy to a set of real unsolved cases and showed that the parent pool is a powerful filter that is complementary to other commonly used variant filters such as population variant frequencies.

biorxiv bioinformatics 0-100-users 2019

PIRATE A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria, bioRxiv, 2019-04-05

AbstractCataloguing the distribution of genes within natural bacterial populations is essential for understanding evolutionary processes and the genetic basis of adaptation. Here we present a pangenomics toolbox, PIRATE (Pangenome Iterative Refinement And Threshold Evaluation), which identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds. PIRATE builds upon recent scalable software developments to allow for the rapid interrogation of thousands of isolates. PIRATE clusters genes (or other annotated features) over a wide range of amino-acid or nucleotide identity thresholds and uses the clustering information to rapidly classify paralogous gene families into either putative fissionfusion events or gene duplications. Furthermore, PIRATE orders the pangenome using a directed graph, provides a measure of allelic variation and estimates sequence divergence for each gene family. We demonstrate that PIRATE scales linearly with both number of samples and computation resources, allowing for analysis of large genomic datasets, and compares favorably to other popular tools. PIRATE provides a robust framework for analysing bacterial pangenomes, from largely clonal to panmictic species.AvailabilityPIRATE is implemented in Perl and is freely available under an GNU GPL 3 open source license from <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comSionBaylissPIRATE>httpsgithub.comSionBaylissPIRATE<jatsext-link>.<jatssec sec-type=supplementary-material>Supplementary InformationSupplementary data is available online.

biorxiv bioinformatics 100-200-users 2019

The ELIXIR Core Data Resources fundamental infrastructure for the life sciences, bioRxiv, 2019-04-05

AbstractMotivationLife science research in academia, industry, agriculture, and the health sector is critically dependent on free and open data resources. ELIXIR, the European Research Infrastructure for life sciences data, has undertaken the task of identifying the set of Core Data Resources within Europe that are of most fundamental importance to the life science community for the long-term preservation of biological data. Having defined the Core Data Resources, we explored characteristics of the usage, impact and sustainability of the set as a whole to assess the value and importance of these resources as an infrastructure, to understand sustainability to the infrastructure, and to demonstrate a model for assessing Core Data Resources worldwide.ResultsThe nineteen resources designated as Core Data Resources by ELIXIR together form a data infrastructure in Europe that is a subset of the wider worldwide open life sciences data infrastructure. These resources are of crucial importance to research throughout the world. We show that, from 2013 to 2017, data managed by the Core Data Resources tripled and usage doubled while staff numbers increased by only a sixth. Additionally, support for the Core Data Resources is precarious, with all resources together having assured funding for less than a third of current staff after only three years.Our findings demonstrate the importance of the ELIXIR Core Data Resources as repositories for research data and the knowledge generated from those data, while also demonstrating the precarious nature of the funding environment for this infrastructure. The ELIXIR Core Data Resources are part of a larger worldwide life sciences data resources ecosystem. Both within Europe and as part of the Global Biodata Coalition, ELIXIR will work for longer-term support for the worldwide life sciences data resource infrastructure and for the subset of that infrastructure that is the ELIXIR Core Data Resources.

biorxiv bioinformatics 0-100-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo