High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell, bioRxiv, 2017-06-15

AbstractWhile many evolutionary questions can be answered by short read re-sequencing, presenceabsence polymorphisms of genes andor transposons have been largely ignored in large-scale intraspecific evolutionary studies. To enable the rigorous analysis of such variants, multiple high quality and contiguous genome assemblies are essential. Similarly, while genome assemblies based on short reads have made genomics accessible for non-reference species, these assemblies have limitations due to low contiguity. Long-read sequencers and long-read technologies have ushered in a new era of genome sequencing where the lengths of reads exceed those of most repeats. However, because these technologies are not only costly, but also time and compute intensive, it has been unclear how scalable they are. Here we demonstrate a fast and cost effective reference assembly for an Arabidopsis thaliana accession using the USB-sized Oxford Nanopore MinION sequencer and typical consumer computing hardware (4 Cores, 16Gb RAM). We assemble the accession KBS-Mac-74 into 62 contigs with an N50 length of 12.3 Mb covering 100% (119 Mb) of the non-repetitive genome. We demonstrate that the polished KBS-Mac-74 assembly is highly contiguous with BioNano optical genome maps, and of high per-base quality against a likewise polished Pacific Biosciences long-read assembly. The approach we implemented took a total of four days at a cost of less than 1,000 USD for sequencing consumables including instrument depreciation.

biorxiv genomics 200-500-users 2017

Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago, bioRxiv, 2017-06-06

Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens . To examine the region's human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East AfricanEurasian pastoralist groups arriving >1,000 years ago, including the Ju|'hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) 'hot spot' for the evolution of our species.

biorxiv evolutionary-biology 200-500-users 2017

The reproducibility of research and the misinterpretation of P values, bioRxiv, 2017-06-01

AbstractWe wish to answer this question If you observe a “significant” P value after doing a single unbiased experiment, what is the probability that your result is a false positive?. The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive risks.When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 31. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive risk to 5 %, you would have to assume that you were 87% sure that there was a real effect before the experiment was done.If you observe P =0.001 in a well-powered experiment, it gives a likelihood ratio of almost 1001 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive risk would still be 8% if the prior probability of a real effect were only 0.1. And, in this case, if you wanted to achieve a false positive risk of 5% you would need to observe P = 0.00045.It is recommended that the terms “significant” and “non-significant” should never be used. Rather, P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive risk. It may also be helpful to specify the minimum false positive risk associated with the observed P value.Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as “statistically significant”. This practice must contribute to the lack of reproducibility in some areas of science. This is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Precise inductive inference is impossible and replication is the only way to be sure,Science is endangered by statistical misunderstanding, and by senior people who impose perverse incentives on scientists.

biorxiv scientific-communication-and-education 200-500-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo