Reagent contamination can critically impact sequence-based microbiome analyses, bioRxiv, 2014-07-17
AbstractThe study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR based 16S rRNA gene surveys and shotgun metagenomics. These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. Concurrent sequencing of negative control samples is strongly advised.
biorxiv molecular-biology 100-200-users 2014Redefining Genomic Privacy Trust and Empowerment, bioRxiv, 2014-06-26
Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques that treat privacy versus data utility as a zero-sum game. Instead we propose using trust-enabling techniques to create a solution where researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.
biorxiv genomics 0-100-users 2014Regulatory variants explain much more heritability than coding variants across 11 common diseases, bioRxiv, 2014-04-22
Common variants implicated by genome-wide association studies (GWAS) of complex diseases are known to be enriched for coding and regulatory variants. We applied methods to partition the heritability explained by genotyped SNPs (h2g) across functional categories (while accounting for shared variance due to linkage disequilibrium) to genotype and imputed data for 11 common diseases. DNaseI Hypersensitivity Sites (DHS) from 218 cell-types, spanning 16% of the genome, explained an average of 79% of h2g (5.1× enrichment; P < 10−20); further enrichment was observed at enhancer and cell-type specific DHS elements. The enrichments were much smaller in analyses that did not use imputed data or were restricted to GWAS- associated SNPs. In contrast, coding variants, spanning 1% of the genome, explained only 8% of h2g (13.8× enrichment; P = 5 × 10−4). We replicated these findings but found no significant contribution from rare coding variants in an independent schizophrenia cohort genotyped on GWAS and exome chips.
biorxiv genetics 0-100-users 2014Flexible analysis of transcriptome assemblies with Ballgown, bioRxiv, 2014-03-31
We have built a statistical package called Ballgown for estimating differential expression of genes, transcripts, or exons from RNA sequencing experiments. Ballgown is designed to work with the popular Cufflinks transcript assembly software and uses well-motivated statistical methods to provide estimates of changes in expression. It permits statistical analysis at the transcript level for a wide variety of experimental designs, allows adjustment for confounders, and handles studies with continuous covariates. Ballgown provides improved statistical significance estimates as compared to the Cuffdiff differential expression tool included with Cufflinks. We demonstrate the flexibility of the Ballgown package by re-analyzing 667 samples from the GEUVADIS study to identify transcript-level eQTLs and identify non-linear artifacts in transcript data. Our package is freely available from httpsgithub.comalyssafrazeeballgown
biorxiv bioinformatics 0-100-users 2014Towards a new history and geography of human genes informed by ancient DNA, bioRxiv, 2014-03-22
Genetic information contains a record of the history of our species, and technological advances have transformed our ability to access this record. Many studies have used genome-wide data from populations today to learn about the peopling of the globe and subsequent adaptation to local conditions. Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture and population replacement have been the rule rather than the exception in human history. In light of this, we argue that it is time to critically re-evaluate current views of the peopling of the globe and the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically-known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.
biorxiv evolutionary-biology 0-100-users 2014Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, bioRxiv, 2014-02-20
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an RBioconductor package.
biorxiv bioinformatics 0-100-users 2014