Reports | audiences

Illumina TruSeq synthetic long-reads empowerde novoassembly and resolve complex, highly repetitive transposable elements, bioRxiv, 2014-01-22

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including thede novoassembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or present in complex genomic arrangements. While TEs strongly affect genome function and evolution, most currentde novoassembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly parallel library preparation and local assembly of short read data and achieve lengths of 1.5-18.5 Kbp with an extremely low error rate (∼0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organismDrosophila melanogaster(reference genome strainy;cn,bw,sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 of annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long- reads, and likely other methods that generate long reads, offer a powerful approach to improvede novoassemblies of whole genomes.

biorxiv genomics 100-200-users 2014

Ancient human genomes suggest three ancestral populations for present-day Europeans, bioRxiv, 2013-12-24

We sequenced genomes from a ~7,000 year old early farmer from Stuttgart in Germany, an ~8,000 year old hunter-gatherer from Luxembourg, and seven ~8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had ~44% ancestry from a Basal Eurasian lineage that split prior to the diversification of all other non-African lineages.

biorxiv genetics 100-200-users 2013

Analysis of the study of the cerebellar pinceau by Korn and Axelrad, bioRxiv, 2013-12-04

The axon initial segment of each cerebellar Purkinje cell is ensheathed by basket cell axons in a structure called the pinceau, which is largely devoid of chemical synapses and gap junctions. These facts and ultrastructural similarities with the axon cap of the teleost Mauthner cell led to the conjecture that the pinceau mediates ephaptic (via the extracellular field) inhibition. Korn and Axelrad published a study in 1980 in which they reported confirmation of this conjecture. We have analysed their results and show that most are likely to be explained by an artefactual signal arising from the massive stimulation of parallel fibres they employed. We reproduce their experiments and confirm that all of their results are consistent with this artefact. Their data therefore provide no evidence regarding the operation of the pinceau.

biorxiv neuroscience 0-100-users 2013

Human genetics and clinical aspects of neurodevelopmental disorders, bioRxiv, 2013-11-30

There are ~12 billion nucleotides in every cell of the human body, and there are ~25-100 trillion cells in each human body. Given somatic mosaicism, epigenetic changes and environmental differences, no two human beings are the same, particularly as there are only ~7 billion people on the planet. One of the next great challenges for studying human genetics will be to acknowledge and embrace complexity. Every human is unique, and the study of human disease phenotypes (and phenotypes in general) will be greatly enriched by moving from a deterministic to a more stochasticprobabilistic model. The dichotomous distinction between simple and complex diseases is completely artificial, and we argue instead for a model that considers a spectrum of diseases that are variably manifesting in each person. The rapid adoption of whole genome sequencing (WGS) and the Internet-mediated networking of people promise to yield more insight into this century-old debate. Comprehensive ancestry tracking and detailed family history data, when combined with WGS or at least cascade-carrier screening, might eventually facilitate a degree of genetic prediction for some diseases in the context of their familial and ancestral etiologies. However, it is important to remain humble, as our current state of knowledge is not yet sufficient, and in principle, any number of nucleotides in the genome, if mutated or modified in a certain way and at a certain time and place, might influence some phenotype during embryogenesis or postnatal life.

biorxiv genetics 0-100-users 2013