Ten Simple Rules for Taking Advantage of git and GitHub, bioRxiv, 2016-04-16
AbstractA ‘Ten Simple Rules’ guide to git and GitHub. We describe and provide examples on how to use these software to track projects, as users, teams and organizations. We document collaborative development using branching and forking, interaction between collaborators using issues and continuous integration and automation using, for example, Travis CI and codevoc. We also describe dissemination and social aspects of GitHub such as GitHub pages, following and watching repositories, and give advice on how to make code citable.
biorxiv bioinformatics 100-200-users 2016Third-generation sequencing and the future of genomics, bioRxiv, 2016-04-14
AbstractThird-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address longstanding problems in de novo genome assembly, structural variation analysis and haplotype phasing.
biorxiv bioinformatics 100-200-users 2016Detecting DNA Methylation using the Oxford Nanopore Technologies MinION sequencer, bioRxiv, 2016-04-05
AbstractNanopore sequencing instruments measure the change in electric current caused by DNA transiting through the pore. In experimental and prototype nanopore sequencing devices it has been shown that the electrolytic current signals are sensitive to base modifications, such as 5-methylcytosine. Here we quantify the strength of this effect for the Oxford Nanopore Technologies MinION sequencer. Using synthetically methylated DNA we are able to train a hidden Markov model to distinguish 5-methylcytosine from unmethylated cytosine in DNA. We demonstrate by sequencing natural human DNA, without any special library preparation, that global patterns of methylation can be detected from low-coverage sequencing and that the methylation status of CpG islands can be reliably predicted from single MinION reads. Our trained model and prediction software is open source and freely available to the community under the MIT license.
biorxiv genomics 200-500-users 2016Div-Seq A single nucleus RNA-Seq method reveals dynamics of rare adult newborn neurons in the CNS, bioRxiv, 2016-03-28
AbstractTranscriptomes of individual neurons provide rich information about cell types and dynamic states. However, it is difficult to capture rare dynamic processes, such as adult neurogenesis, because isolation from dense adult tissue is challenging, and markers for each phase are limited. Here, we developed Div-Seq, which combines Nuc-Seq, a scalable single nucleus RNA-Seq method, with EdU-mediated labeling of proliferating cells. We first show that Nuc-Seq can sensitively identify closely related cell types within the adult hippocampus. We apply Div-Seq to track transcriptional dynamics of newborn neurons in an adult neurogenic region in the hippocampus. Finally, we find rare adult newborn GABAergic neurons in the spinal cord, a non-canonical neurogenic region. Taken together, Nuc-Seq and Div-Seq open the way for unbiased analysis of any complex tissue.
biorxiv neuroscience 0-100-users 2016Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics, bioRxiv, 2016-03-24
AbstractScalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations were tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
biorxiv bioinformatics 0-100-users 2016LeafCutter annotation-free quantification of RNA splicing, bioRxiv, 2016-03-17
AbstractThe excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable intron splicing events from short-read RNA-seq data and finds alternative splicing events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both for detecting differential splicing between sample groups, and for mapping splicing quantitative trait loci (sQTLs). Compared to contemporary methods, we find 1.4–2.1 times more sQTLs, many of which help us ascribe molecular effects to disease-associated variants. Strikingly, transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at 5% FDR by an average of 2.1-fold as compared to using gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comdavidaknowlesleafcutter>httpsgithub.comdavidaknowlesleafcutter<jatsext-link>.
biorxiv genomics 100-200-users 2016