High-Fidelity Nanopore Sequencing of Ultra-Short DNA Sequences, bioRxiv, 2019-02-17
Nanopore sequencing offers a portable and affordable alternative to sequencing-by-synthesis methods but suffers from lower accuracy and cannot sequence ultra-short DNA. This puts applications such as molecular diagnostics based on the analysis of cell-free DNA or single-nucleotide variants (SNV) out of reach. To overcome these limitations, we report a nanopore-based sequencing strategy in which short target sequences are first circularized and then amplified via rolling-circle amplification to produce long stretches of concatemeric repeats. These can be sequenced on the MinION platform from Oxford Nanopore Technologies (ONT), and the resulting repeat sequences aligned to produce a highly-accurate consensus that reduces the high error-rate present in the individual repeats. Using this approach, we demonstrate for the first time the ability to obtain unbiased and accurate nanopore data for target DNA sequences of < 100 bp. Critically, this approach is sensitive enough to achieve SNV discrimination in mixtures of sequences and even enables quantitative detection of specific variants present at ratios of < 10%. Our method is simple, cost-effective, and only requires well-established processes. It therefore expands the utility of nanopore sequencing for molecular diagnostics and other applications, especially in resource-limited settings.
biorxiv bioengineering 100-200-users 2019SquiggleKit A toolkit for manipulating nanopore signal data, bioRxiv, 2019-02-17
The management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modular tools can be used to reduce file numbers and memory footprint, identify poly-A tails, target barcodes, adapters, and find nucleotide sequence motifs in raw nanopore signal, amongst other applications. SquiggleKit serves as a bioinformatics portal into signal space, for novice and experienced users alike. It is comprehensively documented, simple to use, cross-platform compatible and freely available from (httpsgithub.comPsy-FerSquiggleKit).
biorxiv bioinformatics 100-200-users 2019A high-resolution, chromosome-assigned Komodo dragon genome reveals adaptations in the cardiovascular, muscular, and chemosensory systems of monitor lizards, bioRxiv, 2019-02-16
Monitor lizards are unique among ectothermic reptiles in that they have a high aerobic capacity and distinctive cardiovascular physiology which resembles that of endothermic mammals. We have sequenced the genome of the Komodo dragon (Varanus komodoensis), the largest extant monitor lizard, and present a high resolution de novo chromosome-assigned genome assembly for V. komodoensis, generated with a hybrid approach of long-range sequencing and single molecule physical mapping. Comparing the genome of V. komodoensis with those of related species showed evidence of positive selection in pathways related to muscle energy metabolism, cardiovascular homeostasis, and thrombosis. We also found species-specific expansions of a chemoreceptor gene family related to pheromone and kairomone sensing in V. komodoensis and several other lizard lineages. Together, these evolutionary signatures of adaptation reveal genetic underpinnings of the unique Komodo sensory, cardiovascular, and muscular systems, and suggest that selective pressure altered thrombosis genes to help Komodo dragons evade the anticoagulant effects of their own saliva. As the only sequenced monitor lizard genome, the Komodo dragon genome is an important resource for understanding the biology of this lineage and of reptiles worldwide.
biorxiv genomics 100-200-users 2019Cryptic inoviruses are pervasive in bacteria and archaea across Earth's biomes, bioRxiv, 2019-02-16
Bacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content, and infection cycle. To date, a relatively small number of inovirus isolates have been extensively studied, either for biotechnological applications such as phage display, or because of their impact on the toxicity of known bacterial pathogens including Vibrio cholerae and Neisseria meningitidis. Here we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Using a new machine learning approach leveraging a combination of marker gene and genome features, we identified 10,295 inovirus-like genomes from microbial genomes and metagenomes. Collectively, these represent six distinct proposed inovirus families infecting nearly all bacterial phyla across virtually every ecosystem. Putative inoviruses were also detected in several archaeal genomes, suggesting that these viruses may have occasionally transferred from bacterial to archaeal hosts. Finally, we identified an expansive diversity of inovirus-encoded toxin-antitoxin and gene expression modulation systems, alongside evidence of both synergistic (CRISPR evasion) and antagonistic (superinfection exclusion) interactions with co-infecting viruses which we experimentally validated in a Pseudomonas model. Capturing this previously obscured component of the global virosphere sparks new avenues for microbial manipulation approaches and innovative biotechnological applications.
biorxiv microbiology 100-200-users 2019Cryptic inoviruses are pervasive in bacteria and archaea across Earth’s biomes, bioRxiv, 2019-02-16
AbstractBacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content, and infection cycle. To date, a relatively small number of inovirus isolates have been extensively studied, either for biotechnological applications such as phage display, or because of their impact on the toxicity of known bacterial pathogens including Vibrio cholerae and Neisseria meningitidis. Here we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Using a new machine learning approach leveraging a combination of marker gene and genome features, we identified 10,295 inovirus-like genomes from microbial genomes and metagenomes. Collectively, these represent six distinct proposed inovirus families infecting nearly all bacterial phyla across virtually every ecosystem. Putative inoviruses were also detected in several archaeal genomes, suggesting that these viruses may have occasionally transferred from bacterial to archaeal hosts. Finally, we identified an expansive diversity of inovirus-encoded toxin-antitoxin and gene expression modulation systems, alongside evidence of both synergistic (CRISPR evasion) and antagonistic (superinfection exclusion) interactions with co-infecting viruses which we experimentally validated in a Pseudomonas model. Capturing this previously obscured component of the global virosphere sparks new avenues for microbial manipulation approaches and innovative biotechnological applications.
biorxiv microbiology 100-200-users 2019A method for genome-wide genealogy estimation for thousands of samples, bioRxiv, 2019-02-15
Knowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We developed a method, Relate, scaling to > 10,000 sequences while simultaneously estimating branch lengths, mutational ages, and variable historical population sizes, as well as allowing for data errors. Application to 1000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events, unique to that continent. Our approach allows more powerful inferences of natural selection than previously possible. We identify multiple novel regions under strong positive selection, and multi-allelic traits including hair colour, BMI, and blood pressure, showing strong evidence of directional selection, varying among human groups.
biorxiv genetics 100-200-users 2019