Benchmarking of alignment-free sequence comparison methods, bioRxiv, 2019-04-16
ABSTRACTAlignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Here, we present a community resource (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpafproject.org>httpafproject.org<jatsext-link>) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference and reconstruction of species trees under horizontal gene transfer and recombination events. The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.
biorxiv bioinformatics 100-200-users 2019Developmentally regulated Shh expression is robust to TAD perturbations, bioRxiv, 2019-04-16
AbstractTopologically Associating Domains (TADs) have been proposed to both guide and constrain enhancer activity. Shh is located within a TAD known to contain all its enhancers. To investigate the importance of chromatin conformation and TAD integrity on developmental gene regulation, we have manipulated the Shh TAD – creating internal deletions, deleting CTCF sites including those at TAD boundaries, as well as larger deletions and inversions of TAD boundaries. Chromosome conformation capture and fluorescence in situ hybridisation assays were used the investigate changes in chromatin conformation that result from these manipulations. Our data suggest that the substantial alteration of TAD structure has no readily detectable effect on Shh expression patterns during development – except where enhancers are deleted - and results in no detectable phenotypes. Only in the case of a larger deletion of one TAD boundary could some ectopic influence of the Shh limb enhancer be detected on a gene - Mnx1 in the neighbouring TAD. Our data suggests that, contrary to expectations, the developmental regulation of Shh expression is remarkably robust to TAD perturbations.
biorxiv developmental-biology 200-500-users 2019Extensive impact of low-frequency variants on the phenotypic landscape at population-scale, bioRxiv, 2019-04-16
AbstractGenome-wide association studies (GWAS) allows to dissect the genetic basis of complex traits at the population level1. However, despite the extensive number of trait-associated loci found, they often fail to explain a large part of the observed phenotypic variance2–4. One potential source of this discrepancy could be the preponderance of undetected low-frequency genetic variants in natural populations5,6. To increase the allele frequency of those variants and assess their phenotypic effects at the population level, we generated a diallel panel consisting of 3,025 hybrids, derived from pairwise crosses between a subset of natural isolates from a completely sequenced 1,011 Saccharomyces cerevisiae population. We examined each hybrid across a large number of growth traits, resulting in a total of 148,225 crosstrait combinations. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a significant proportion (29%) is governed by non-additive effects. This is confirmed by the fact that a majority of complete dominance is observed in 25% of the traits. By performing GWAS on the diallel panel, we detected 1,723 significantly associated genetic variants, with 16.3% of them being low-frequency variants in the initial population. These variants, which would not be detected using classical GWAS, explain 21% of the phenotypic variance on average. Altogether, our results demonstrate that low-frequency variants should be accounted for as they contribute to a large part of the phenotypic variation observed in a population.
biorxiv genomics 100-200-users 2019High accuracy DNA sequencing on a small, scalable platform via electrical detection of single base incorporations, bioRxiv, 2019-04-16
AbstractHigh throughput DNA sequencing technologies have undergone tremendous development over the past decade. Although optical detection-based sequencing has constituted the majority of data output, it requires a large capital investment and aggregation of samples to achieve optimal cost per sample. We have developed a novel electronic detection-based platform capable of accurately detecting single base incorporations. The GenapSys technology with its electronic detection modality allows the system to be compact, accessible, and affordable. We demonstrate the performance of the system by sequencing several different microbial genomes with varying GC content. The platform is capable of generating 1.5 Gb of high-quality nucleic acid sequence in a single run. We routinely generate sequence data that exceeds 99% raw accuracy with read lengths of up to 175 bp. The utility of the platform is highlighted by targeted sequencing of the human genome. We show high concordance of SNP detection on the human NA12878 HapMap cell line with data generated on the Illumina sequencing platform. In addition, we sequenced a targeted panel of cancer-associated genes in a well characterized reference standard. With multiple library preparation approaches on this sample, we were able to identify low frequency mutations at expected allele frequencies.
biorxiv genomics 100-200-users 2019nf-core Community curated bioinformatics pipelines, bioRxiv, 2019-04-16
AbstractThe standardization, portability, and reproducibility of analysis pipelines is a renowned problem within the bioinformatics community. Most pipelines are designed for execution on-premise, and the associated software dependencies are tightly coupled with the local compute environment. This leads to poor pipeline portability and reproducibility of the ensuing results - both of which are fundamental requirements for the validation of scientific findings. Here, we introduce nf-core a framework that provides a community-driven, peer-reviewed platform for the development of best practice analysis pipelines written in Nextflow. Key obstacles in pipeline development such as portability, reproducibility, scalability and unified parallelism are inherently addressed by all nf-core pipelines. We are also continually developing a suite of tools that assist in the creation and development of both new and existing pipelines. Our primary goal is to provide a platform for high-quality, reproducible bioinformatics pipelines that can be utilized across various institutions and research facilities.
biorxiv bioinformatics 100-200-users 2019Protein engineering expands the effector recognition profile of a rice NLR immune receptor, bioRxiv, 2019-04-16
AbstractPlant NLR receptors detect pathogen effectors and initiate an immune response. Since their discovery, NLRs have been the focus of protein engineering to improve disease resistance. However, this has proven challenging, in part due to their narrow response specificity. Here, we used structure-guided engineering to expand the response profile of the rice NLR Pikp to variants of the rice blast pathogen effector AVR-Pik. A mutation located within an effector binding interface of the integrated Pikp-HMA domain increased the binding affinity for AVR-Pik variants in vitro and in vivo. This translates to an expanded cell death response to AVR-Pik variants previously unrecognized by Pikp in planta. Structures of the engineered Pikp-HMA in complex with AVR-Pik variants revealed the mechanism of expanded recognition. These results provide a proof-of-concept that protein engineering can improve the utility of plant NLR receptors where direct interaction between effectors and NLRs is established, particularly via integrated domains.
biorxiv plant-biology 0-100-users 2019