A reference panel of 64,976 haplotypes for genotype imputation, bioRxiv, 2015-12-24
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
biorxiv genetics 100-200-users 2015No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini, bioRxiv, 2015-12-02
AbstractTardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper (Boothby TC et al (2015) Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc Natl Acad Sci USA 11215976-15981) the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT), and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini. As expected from whole-organism DNA sampling, our raw data contained reads from non-target genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria, and 0.2% strong candidates for fHGT from non-metazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1-2% of genes and that the proposal that one sixth of tardigrade genes originate from functional HGT events is an artefact of undetected contamination.
biorxiv genomics 200-500-users 2015Early farmers from across Europe directly descended from Neolithic Aegeans, bioRxiv, 2015-11-26
Farming and sedentism first appear in southwest Asia during the early Holocene and later spread to neighboring regions, including Europe, along multiple dispersal routes. Conspicuous uncertainties remain about the relative roles of migration, cultural diffusion and admixture with local foragers in the early Neolithisation of Europe. Here we present paleogenomic data for five Neolithic individuals from northwestern Turkey and northern Greece, spanning the time and region of the earliest spread of farming into Europe. We observe striking genetic similarity both among Aegean early farmers and with those from across Europe. Our study demonstrates a direct genetic link between Mediterranean and Central European early farmers and those of Greece and Anatolia, extending the European Neolithic migratory chain all the way back to southwestern Asia.
biorxiv genomics 0-100-users 2015FecalSeq methylation-based enrichment for noninvasive population genomics from feces, bioRxiv, 2015-11-26
AbstractObtaining high-quality samples from wild animals is a major obstacle for genomic studies of many taxa, particular at the population level, as collection methods for such samples are typically invasive. DNA from feces is easy to obtain noninvasively, but is dominated by a preponderance of bacterial and other non-host DNA. Because next-generation sequencing technology sequences DNA largely indiscriminately, the high proportion of exogenous DNA drastically reduces the efficiency of high-throughput sequencing for host animal genomics. In order to address this issue, we developed an inexpensive methylation-based capture method for enriching host DNA from noninvasively obtained fecal DNA samples. Our method exploits natural differences in CpG-methylation density between vertebrate and bacterial genomes to preferentially bind and isolate host DNA from majority-bacterial fecal DNA samples. We demonstrate that the enrichment is robust, efficient, and compatible with downstream library preparation methods useful for population studies (e.g., RADseq). Compared to other enrichment strategies, our method is quick and inexpensive, adding only a negligible cost to sample preparation for research that is often severely constrained by budgetary limitations. In combination with downstream methods such as RADseq, our approach allows for cost-effective and customizable genomic-scale genotyping that was previously feasible in practice only with invasive samples. Because feces are widely available and convenient to collect, our method empowers researchers to explore genomic-scale population-level questions in organisms for which invasive sampling is challenging or undesirable.
biorxiv genomics 0-100-users 2015Health and population effects of rare gene knockouts in adult humans with related parents, bioRxiv, 2015-11-15
Complete gene knockouts are highly informative about gene function. We exome sequenced 3,222 British Pakistani-heritage adults with high parental relatedness, discovering 1,111 rare-variant homozygous likely loss of function (rhLOF) genotypes predicted to disrupt (knockout) 781 genes. Based on depletion of rhLOF genotypes, we estimate that 13.6% of knockouts are incompatible with adult life, finding on average 1.6 heterozygous recessive lethal LOF variants per adult. Linking to lifelong health records, we observed no association of rhLOF genotypes with prescription- or doctor-consultation rate, and no disease-related phenotypes in 33 of 42 individuals with rhLOF genotypes in recessive Mendelian disease genes. Phased genome sequencing of a healthy PRDM9 knockout mother, her child and controls, showed meiotic recombination sites localised away from PRDM9-dependent hotspots, demonstrating PRDM9 redundancy in humans.
biorxiv genomics 0-100-users 2015What's in my pot? Real-time species identification on the MinION, bioRxiv, 2015-11-07
Whole genome sequencing on next-generation instruments provides an unbiased way to identify the organisms present in complex metagenomic samples. However, the time-to-result can be protracted because of fixed-time sequencing runs and cumbersome bioinformatics workflows. This limits the utility of the approach in settings where rapid species identification is crucial, such as in the quality control of food-chain components, or in during an outbreak of an infectious disease. Here we present What′s in my Pot? (WIMP), a laboratory and analysis workflow in which, starting with an unprocessed sample, sequence data is generated and bacteria, viruses and fungi present in the sample are classified to subspecies and strain level in a quantitative manner, without prior knowledge of the sample composition, in approximately 3.5 hours. This workflow relies on the combination of Oxford Nanopore Technologies′ MinION ™ sensing device with a real-time species identification bioinformatics application.
biorxiv genomics 100-200-users 2015