A Data Citation Roadmap for Scholarly Data Repositories, bioRxiv, 2016-12-29
AbstractThis article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbiocaddie.org>httpsbiocaddie.org<jatsext-link>) program. The roadmap makes 11 specific recommendations, grouped into three phases of implementation a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate articledata publication workflows, and c) optional steps that further improve data citation support provided by data repositories.
biorxiv scientific-communication-and-education 200-500-users 2016The hippocampus as a predictive map, bioRxiv, 2016-12-29
ABSTRACTA cognitive map has long been the dominant metaphor for hippocampal function, embracing the idea that place cells encode a geometric representation of space. However, evidence for predictive coding, reward sensitivity, and policy dependence in place cells suggests that the representation is not purely spatial. We approach this puzzle from a reinforcement learning perspective what kind of spatial representation is most useful for maximizing future reward? We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map. Furthermore, we argue that entorhinal grid cells encode a low-dimensional basis set for the predictive representation, useful for suppressing noise in predictions and extracting multiscale structure for hierarchical planning.
biorxiv neuroscience 100-200-users 2016Targeted degradation of CTCF decouples local insulation of chromosome domains from higher-order genomic compartmentalization, bioRxiv, 2016-12-22
The molecular mechanisms underlying folding of mammalian chromosomes remain poorly understood. The transcription factor CTCF is a candidate regulator of chromosomal structure. Using the auxin-inducible degron system in mouse embryonic stem cells, we show that CTCF is absolutely and dose-dependently required for looping between CTCF target sites and segmental organization into topologically associating domains (TADs). Restoring CTCF reinstates proper architecture on altered chromosomes, indicating a powerful instructive function for CTCF in chromatin folding, and CTCF remains essential for TAD organization in non-dividing cells. Surprisingly, active and inactive genome compartments remain properly segregated upon CTCF depletion, revealing that compartmentalization of mammalian chromosomes emerges independently of proper insulation of TADs. Further, our data supports that CTCF mediates transcriptional insulator function through enhancer-blocking but not direct chromatin barrier activity. These results define the functions of CTCF in chromosome folding, and provide new fundamental insights into the rules governing mammalian genome organization.
biorxiv genomics 200-500-users 2016When null hypothesis significance testing is unsuitable for research a reassessment, bioRxiv, 2016-12-21
AbstractNull hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of psychology, cognitive neuroscience and biomedical science in general. We review these shortcomings and suggest that, after about 60 years of negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. Different inferential methods (NHST, likelihood estimation, Bayesian methods, false-discovery rate control) may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Studies should optimally be pre-registered and raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. Instead, we should encourage either more in-depth statistical training of more researchers andor more widespread involvement of professional statisticians in all research.
biorxiv neuroscience 100-200-users 2016Improved maize reference genome with single molecule technologies, bioRxiv, 2016-12-20
ABSTRACTComplete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate elucidation of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here, we report the assembly and annotation of maize, a genetic and agricultural model species, using Single Molecule Real-Time (SMRT) sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and significant improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed over 130,000 intact transposable elements (TEs), allowing us to identify TE lineage expansions unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by SMRT sequencing. In addition, comparative optical mapping of two other inbreds revealed a prevalence of deletions in the low gene density region and maize lineage-specific genes.
biorxiv genomics 100-200-users 2016MR-Base a platform for systematic causal inference across the phenome using billions of genetic associations, bioRxiv, 2016-12-17
AbstractPublished genetic associations can be used to infer causal relationships between phenotypes, bypassing the need for individual-level genotype or phenotype data. We have curated complete summary data from 1094 genome-wide association studies (GWAS) on diseases and other complex traits into a centralised database, and developed an analytical platform that uses these data to perform Mendelian randomization (MR) tests and sensitivity analyses (MR-Base, <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpwww.mrbase.org>httpwww.mrbase.org<jatsext-link>). Combined with curated data of published GWAS hits for phenomic measures, the MR-Base platform enables millions of potential causal relationships to be evaluated. We use the platform to predict the impact of lipid lowering on human health. While our analysis provides evidence that reducing LDL-cholesterol, lipoprotein(a) or triglyceride levels reduce coronary disease risk, it also suggests causal effects on a number of other non-vascular outcomes, indicating potential for adverse-effects or drug repositioning of lipid-lowering therapies.
biorxiv epidemiology 0-100-users 2016