Reconstructing the Gigabase Plant Genome of Solanum pennellii using Nanopore Sequencing, bioRxiv, 2017-04-22

Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples1-3. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu4 pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of &lt;0.02 % when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome5. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes.Raw data is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpwww.plabipd.deportalsolanum-pennellii>httpwww.plabipd.deportalsolanum-pennellii<jatsext-link> and has been deposited as PRJEB19787.

biorxiv genomics 0-100-users 2017

Improved de novo Genome Assembly Linked-Read Sequencing Combined with Optical Mapping Produce a High Quality Mammalian Genome at Relatively Low Cost, bioRxiv, 2017-04-19

AbstractCurrent short-read methods have come to dominate genome sequencing because they are cost-effective, rapid, and accurate. However, short reads are most applicable when data can be aligned to a known reference. Two new methods for de novo assembly are linked-reads and restriction-site labeled optical maps. We combined commercial applications of these technologies for genome assembly of an endangered mammal, the Hawaiian Monk seal.We show that the linked-reads produced with 10X Genomics Chromium chemistry and assembled with Supernova v1.1 software produced scaffolds with an N50 of 22.23 Mbp with the longest individual scaffold of 84.06 Mbp. When combined with Bionano Genomics optical maps using Bionano RefAligner, the scaffold N50 increased to 29.65 Mbp for a total of 170 hybrid scaffolds, the longest of which was 84.78 Mbp. These results were 161X and 215X, respectively, improved over DISCOVAR de novo assemblies. The quality of the scaffolds was assessed using conserved synteny analysis of both the DNA sequence and predicted seal proteins relative to the genomes of humans and other species. We found large blocks of conserved synteny suggesting that the hybrid scaffolds were high quality. An inversion in one scaffold complementary to human chromosome 6 was found and confirmed by optical maps.The complementarity of linked-reads and optical maps is likely to make the production of high quality genomes more routine and economical and, by doing so, significantly improve our understanding of comparative genome biology.

biorxiv genomics 0-100-users 2017

An Integrative Framework for Detecting Structural Variations in Cancer Genomes, bioRxiv, 2017-03-29

AbstractStructural variants can contribute to oncogenesis through a variety of mechanisms, yet, despite their importance, the identification of structural variants in cancer genomes remains challenging. Here, we present an integrative framework for comprehensively identifying structural variation in cancer genomes. For the first time, we apply next-generation optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole genome sequencing to systematically detect SVs in a variety of cancer cells.Using this approach, we identify and characterize structural variants in up to 29 commonly used normal and cancer cell lines. We find that each method has unique strengths in identifying different classes of structural variants and at different scales, suggesting that integrative approaches are likely the only way to comprehensively identify structural variants in the genome. Studying the impact of the structural variants in cancer cell lines, we identify widespread structural variation events affecting the functions of non-coding sequences in the genome, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel 3D chromatin structural domains.These results underscore the importance of comprehensive structural variant identification and indicate that non-coding structural variation may be an underappreciated mutational process in cancer genomes.

biorxiv genomics 0-100-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo