Ten Simple Rules for Taking Advantage of git and GitHub, bioRxiv, 2016-04-16
AbstractA ‘Ten Simple Rules’ guide to git and GitHub. We describe and provide examples on how to use these software to track projects, as users, teams and organizations. We document collaborative development using branching and forking, interaction between collaborators using issues and continuous integration and automation using, for example, Travis CI and codevoc. We also describe dissemination and social aspects of GitHub such as GitHub pages, following and watching repositories, and give advice on how to make code citable.
biorxiv bioinformatics 100-200-users 2016Third-generation sequencing and the future of genomics, bioRxiv, 2016-04-14
AbstractThird-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address longstanding problems in de novo genome assembly, structural variation analysis and haplotype phasing.
biorxiv bioinformatics 100-200-users 2016LeafCutter annotation-free quantification of RNA splicing, bioRxiv, 2016-03-17
AbstractThe excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable intron splicing events from short-read RNA-seq data and finds alternative splicing events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both for detecting differential splicing between sample groups, and for mapping splicing quantitative trait loci (sQTLs). Compared to contemporary methods, we find 1.4–2.1 times more sQTLs, many of which help us ascribe molecular effects to disease-associated variants. Strikingly, transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at 5% FDR by an average of 2.1-fold as compared to using gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comdavidaknowlesleafcutter>httpsgithub.comdavidaknowlesleafcutter<jatsext-link>.
biorxiv genomics 100-200-users 2016Relic DNA is abundant in soil and obscures estimates of soil microbial diversity, bioRxiv, 2016-03-17
AbstractIt is implicitly assumed that the microbial DNA recovered from soil originates from living cells. However, because relic DNA (DNA from dead cells) can persist in soil for weeks to years, it could impact DNA-based analyses of microbial diversity. We examined a wide range of soils and found that, on average, 40% of prokaryotic and fungal DNA was derived from the relic DNA pool. Relic DNA inflated the observed prokaryotic and fungal diversity by as much as 55%, and caused misestimation of taxon abundances, including taxa integral to key ecosystem processes. These findings imply that relic DNA can obscure treatment effects, spatiotemporal patterns, and relationships between taxa and environmental conditions. Moreover, relic DNA may represent a historical record of microbes formerly living in soil.One Sentence SummarySoils can harbor substantial amounts of DNA from dead microbial cells; this ‘relic’ DNA inflates estimates of microbial diversity and obscures assessments of community structure.
biorxiv ecology 100-200-users 2016A reference panel of 64,976 haplotypes for genotype imputation, bioRxiv, 2015-12-24
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
biorxiv genetics 100-200-users 2015What's in my pot? Real-time species identification on the MinION, bioRxiv, 2015-11-07
Whole genome sequencing on next-generation instruments provides an unbiased way to identify the organisms present in complex metagenomic samples. However, the time-to-result can be protracted because of fixed-time sequencing runs and cumbersome bioinformatics workflows. This limits the utility of the approach in settings where rapid species identification is crucial, such as in the quality control of food-chain components, or in during an outbreak of an infectious disease. Here we present What′s in my Pot? (WIMP), a laboratory and analysis workflow in which, starting with an unprocessed sample, sequence data is generated and bacteria, viruses and fungi present in the sample are classified to subspecies and strain level in a quantitative manner, without prior knowledge of the sample composition, in approximately 3.5 hours. This workflow relies on the combination of Oxford Nanopore Technologies′ MinION ™ sensing device with a real-time species identification bioinformatics application.
biorxiv genomics 100-200-users 2015