High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell, bioRxiv, 2017-06-15
AbstractWhile many evolutionary questions can be answered by short read re-sequencing, presenceabsence polymorphisms of genes andor transposons have been largely ignored in large-scale intraspecific evolutionary studies. To enable the rigorous analysis of such variants, multiple high quality and contiguous genome assemblies are essential. Similarly, while genome assemblies based on short reads have made genomics accessible for non-reference species, these assemblies have limitations due to low contiguity. Long-read sequencers and long-read technologies have ushered in a new era of genome sequencing where the lengths of reads exceed those of most repeats. However, because these technologies are not only costly, but also time and compute intensive, it has been unclear how scalable they are. Here we demonstrate a fast and cost effective reference assembly for an Arabidopsis thaliana accession using the USB-sized Oxford Nanopore MinION sequencer and typical consumer computing hardware (4 Cores, 16Gb RAM). We assemble the accession KBS-Mac-74 into 62 contigs with an N50 length of 12.3 Mb covering 100% (119 Mb) of the non-repetitive genome. We demonstrate that the polished KBS-Mac-74 assembly is highly contiguous with BioNano optical genome maps, and of high per-base quality against a likewise polished Pacific Biosciences long-read assembly. The approach we implemented took a total of four days at a cost of less than 1,000 USD for sequencing consumables including instrument depreciation.
biorxiv genomics 200-500-users 2017Evolutionary persistence of DNA methylation for millions of years after ancient loss of a de novo methyltransferase, bioRxiv, 2017-06-14
SUMMARYCytosine methylation of DNA is a widespread modification of DNA that plays numerous critical roles, yet has been lost many times in diverse eukaryotic lineages. In the yeast Cryptococcus neoformans, CG methylation occurs in transposon-rich repeats and requires the DNA methyltransferase, Dnmt5. We show that Dnmt5 displays exquisite maintenance-type specificity in vitro and in vivo and utilizes similar in vivo cofactors as the metazoan maintenance methylase Dnmt1. Remarkably, phylogenetic and functional analysis revealed that the ancestral species lost the gene for a de novo methylase, DnmtX, between 50-150 MYA. We examined how methylation has persisted since the ancient loss of DnmtX. Experimental and comparative studies reveal efficient replication of methylation patterns in C. neoformans, rare stochastic methylation loss and gain events, and the action of natural selection. We propose that an epigenome has been propagated for >50 MY through a process analogous to Darwinian evolution of the genome.
biorxiv molecular-biology 200-500-users 2017Low rate of somatic mutations in a long-lived oak tree, bioRxiv, 2017-06-14
Because plants do not possess a proper germline, deleterious somatic mutations can be passed to gametes and a large number of cell divisions separating zygote from gamete formation in long-lived plants may lead to many mutations. We sequenced the genome of two terminal branches of a 234-year-old oak tree and found few fixed somatic single-nucleotide variants (SNVs), whose sequential appearance in the tree could be traced along nested sectors of younger branches. Our data suggest that stem cells of shoot meristems are robustly protected from accumulation of mutations in trees.
biorxiv plant-biology 200-500-users 2017Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago, bioRxiv, 2017-06-06
Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens . To examine the region's human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East AfricanEurasian pastoralist groups arriving >1,000 years ago, including the Ju|'hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) 'hot spot' for the evolution of our species.
biorxiv evolutionary-biology 200-500-users 2017Discovery of the first genome-wide significant risk loci for ADHD, bioRxiv, 2017-06-04
AbstractAttention-DeficitHyperactivity Disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of school-age children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no individual variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 ADHD cases and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, revealing new and important information on the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes, as well as around brain-expressed regulatory marks. These findings, based on clinical interviews andor medical records are supported by additional analyses of a self-reported ADHD sample and a study of quantitative measures of ADHD symptoms in the population. Meta-analyzing these data with our primary scan yielded a total of 16 genome-wide significant loci. The results support the hypothesis that clinical diagnosis of ADHD is an extreme expression of one or more continuous heritable traits.
biorxiv genetics 200-500-users 2017The reproducibility of research and the misinterpretation of P values, bioRxiv, 2017-06-01
AbstractWe wish to answer this question If you observe a “significant” P value after doing a single unbiased experiment, what is the probability that your result is a false positive?. The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive risks.When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 31. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive risk to 5 %, you would have to assume that you were 87% sure that there was a real effect before the experiment was done.If you observe P =0.001 in a well-powered experiment, it gives a likelihood ratio of almost 1001 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive risk would still be 8% if the prior probability of a real effect were only 0.1. And, in this case, if you wanted to achieve a false positive risk of 5% you would need to observe P = 0.00045.It is recommended that the terms “significant” and “non-significant” should never be used. Rather, P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive risk. It may also be helpful to specify the minimum false positive risk associated with the observed P value.Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as “statistically significant”. This practice must contribute to the lack of reproducibility in some areas of science. This is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Precise inductive inference is impossible and replication is the only way to be sure,Science is endangered by statistical misunderstanding, and by senior people who impose perverse incentives on scientists.
biorxiv scientific-communication-and-education 200-500-users 2017