Assembly methods for nanopore-based metagenomic sequencing a comparative study, bioRxiv, 2019-08-02

ABSTRACTBackgroundMetagenomic sequencing has lead to the recovery of previously unexplored microbial genomes. In this sense, short-reads sequencing platforms often result in highly fragmented metagenomes, thus complicating downstream analyses. Third generation sequencing technologies, such as MinION, could lead to more contiguous assemblies due to their ability to generate long reads. Nevertheless, there is a lack of studies evaluating the suitability of the available assembly tools for this new type of data.FindingsWe benchmarked the ability of different short-reads and long-reads tools to assembly two different commercially available mock communities, and observed remarkable differences in the resulting assemblies depending on the software of choice. Short-reads metagenomic assemblers proved unsuitable for MinION data. Among the long-reads assemblers tested, Flye and Canu were the only ones performing well in all the datasets. These tools were able to retrieve complete individual genomes directly from the metagenome, and assembled a bacterial genome in only two contigs in the best scenario. Despite the intrinsic high error of long-reads technologies, Canu and Flye lead to high accurate assemblies (~99.4-99.8 % of accuracy). However, errors still had an impact on the prediction of biosynthetic gene clusters.ConclusionsMinION metagenomic sequencing data proved sufficient for assembling low-complex microbial communities, leading to the recovery of highly complete and contiguous individual genomes. This work is the first systematic evaluation of the performance of different assembly tools on MinION data, and may help other researchers willing to use this technology to choose the most appropriate software depending on their goals. Future work is still needed in order to assess the performance of Oxford Nanopore MinION data on more complex microbiomes.

biorxiv bioinformatics 100-200-users 2019

Genetic tool development in marine protists Emerging model organisms for experimental cell biology, bioRxiv, 2019-08-02

ABSTRACTMarine microbial eukaryotes underpin the largest food web on the planet and influence global biogeochemical cycles that maintain habitability. They are also remarkably diverse and provide insights into evolution, including the origins of complex life forms, as revealed through genome analyses. However, their genetic tractability has been limited to a few species that do not represent the broader diversity of eukaryotic life or some of the most environmentally relevant taxa. Here, we report on genetic systems developed as a community resource for experimental cell biology of aquatic protists from across the eukaryotic tree and primarily from marine habitats. We present evidence for foreign DNA delivery and expression in 14 species never before transformed, report on the advancement of genetic systems in 7 species, review of an already published transformation protocol in 1 species and discuss why the transformation of 17 additional species has not been achieved yet. For all protists studied in this community effort, we outline our methods, constructs, and genome-editing approaches in the context of published systems. The reported breakthroughs on genetic manipulation position the community to dissect cellular mechanisms from a breadth of protists, which will collectively provide insights into ancestral eukaryotic lifeforms, protein diversification and evolution of cellular pathways.

biorxiv ecology 100-200-users 2019

Negative selection on complex traits limits genetic risk prediction accuracy between populations, bioRxiv, 2019-08-02

Accurate genetic risk prediction is a key goal for medical genetics and great progress has been made toward identifying individuals with extreme risk across several traits and diseases (Collins and Varmus, 2015). However, many of these studies are done in predominantly European populations (Bustamante et al., 2011; Popejoy and Fullerton, 2016). Although GWAS effect sizes correlate across ancestries (Wojcik et al., 2019), risk scores show substantial reductions in accuracy when applied to non-European populations (Kim et al., 2018; Martin et al., 2019; Scutari et al., 2016). We use simulations to show that human demographic history and negative selection on complex traits result in population specific genetic architectures. For traits under moderate negative selection, ~50% of the heritability can be accounted for by variants in Europe that are absent from Africa. We show that this directly leads to poor performance in risk prediction when using variants discovered in Europe to predict risk in African populations, especially in the tails of the risk distribution. To evaluate the impact of this effect in genomic data, we built a Bayesian model to stratify heritability between European-specific and shared variants and applied it to 43 traits and diseases in the UK Biobank. Across these phenotypes, we find ~50% of the heritability comes from European-specific variants, setting an upper bound on the accuracy of genetic risk prediction in non-European populations using effect sizes discovered in European populations. We conclude that genetic association studies need to include more diverse populations to enable to utility of genetic risk prediction in all populations.

biorxiv genetics 100-200-users 2019

Phylogenies of extant species are consistent with an infinite array of diversification histories, bioRxiv, 2019-08-01

AbstractTime-calibrated molecular phylogenies of extant species (extant timetrees) are widely used for estimating the dynamics of diversification rates (1–6) and testing for associations between these rates and environmental factors (5, 7) or species traits (8). However, there has been considerable debate surrounding the reliability of these inferences in the absence of fossil data (9–13), and to date this critical question remains unresolved. Here we mathematically clarify the precise information that can be extracted from extant timetrees under the generalized birth-death model, which underlies the majority of existing estimation methods. We prove that for a given extant timetree and a candidate diversification scenario, there exists an infinite number of alternative diversification scenarios that are equally likely to have generated a given tree. These “congruent” scenarios cannot possibly be distinguished using extant timetrees alone, even in the presence of infinite data. Importantly, congruent diversification scenarios can exhibit markedly different and yet plausible diversification dynamics, suggesting that many previous studies may have over-interpreted phylogenetic evidence. We show that sets of congruent models can be uniquely described using composite variables, which contain all available information about past dynamics of diversification (14); this suggests an alternative paradigm for learning about the past from extant timetrees.

biorxiv evolutionary-biology 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo