Challenges and recommendations to improve installability and archival stability of omics computational tools, bioRxiv, 2018-10-26

AbstractDeveloping new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through URLs published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed “easy to install,” and 28% of the tools failed to be installed at all due to problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.

biorxiv bioinformatics 500+-users 2018

RAxML-NG A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference, bioRxiv, 2018-10-19

AbstractMotivationPhylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum like-lihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.ResultsWe present RAxML-NG, a from scratch re-implementation of the established greedy tree search algorithm of RAxMLExaML. RAxML- NG offers improved accuracy, flexibility, speed, scalability, and usability compared to RAxMLExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and a the recently introduced transfer bootstrap support metric.AvailabilityThe code is available under GNU GPL at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comamkozlovraxml-ng.RAxML-NG>httpsgithub.comamkozlovraxml-ng.RAxML-NG<jatsext-link> web service (maintained by Vital- IT) is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsraxml-ng.vital-it.ch>httpsraxml-ng.vital-it.ch<jatsext-link>.Contactalexey.kozlov@h-its.org

biorxiv bioinformatics 200-500-users 2018

A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data, bioRxiv, 2018-10-17

AbstractGenome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a ‘glocal’ alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxinrimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.

biorxiv bioinformatics 100-200-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo