Modern machine learning outperforms GLMs at predicting spikes, bioRxiv, 2017-02-25
AbstractNeuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. It is often unknown how much of explainable neural activity is captured, or missed, when fitting a GLM. Here we compared the predictive performance of GLMs to three leading machine learning methods feedforward neural networks, gradient boosted trees (using XGBoost), and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from standard representations of reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods (particularly XGBoost and the ensemble) produced more accurate spike predictions and were less sensitive to the preprocessing of features. This discrepancy in performance suggests that standard feature sets may often relate to neural activity in a nonlinear manner not captured by GLMs. Encoding models built with machine learning techniques, which can be largely automated, more accurately predict spikes and can offer meaningful benchmarks for simpler models.
biorxiv neuroscience 100-200-users 2017A practical guide for inferring reliable dominance hierarchies and estimating their uncertainty, bioRxiv, 2017-02-24
AbstractMany animal social structures are organized hierarchically, with dominant individuals monopolizing resources. Dominance hierarchies have received great attention from behavioural and evolutionary ecologists. As a result, there are many methods for inferring hierarchies from social interactions. Yet, there are no clear guidelines about how many observed dominance interactions (i.e. sampling effort) are necessary for inferring reliable dominance hierarchies, nor are there any established tools for quantifying their uncertainty. In this study, we simulated interactions (winners and losers) in scenarios of varying steepness (the probability that a dominant defeats a subordinate based on their difference in rank). Using these data, we (1) quantify how the number of interactions recorded and hierarchy steepness affect the performance of three methods, (2) propose an amendment that improves the performance of a popular method, and (3) suggest two easy procedures to measure uncertainty in the inferred hierarchy. First, we found that the ratio of interactions to individuals required to infer reliable hierarchies is surprisingly low, but depends on the hierarchy steepness and method used. We then show that David’s score and our novel randomized Elo-rating are the two best methods, whereas the original Elo-rating and the recently described ADAGIO perform less well. Finally, we propose two simple methods to estimate uncertainty at the individual and group level. These uncertainty measures further allow to differentiate non-existent, very flat and highly uncertain hierarchies from intermediate, steep and certain hierarchies. Overall, we find that the methods for inferring dominance hierarchies are relatively robust, even when the ratio of observed interactions to individuals is as low as 10 to 20. However, we suggest that implementing simple procedures for estimating uncertainty will benefit researchers, and quantifying the shape of the dominance hierarchies will provide new insights into the study organisms.Highlights<jatslist list-type=bullet><jatslist-item>David’s score and the randomized Elo-rating perform best.<jatslist-item><jatslist-item>Method performance depends on hierarchy steepness and sampling effort.<jatslist-item><jatslist-item>Generally, inferring dominance hierarchies requires relatively few observations.<jatslist-item><jatslist-item>The R package “aniDom” allows easy estimation of hierarchy uncertainty.<jatslist-item><jatslist-item>Hierarchy uncertainty provides insights into the shape of the dominance hierarchy.<jatslist-item>
biorxiv animal-behavior-and-cognition 0-100-users 2017Preprinting Microbiology, bioRxiv, 2017-02-24
AbstractThe field of microbiology has experienced significant growth due to transformative advances in technology and the influx of scientists driven by a curiosity to understand how microbes sustain myriad biochemical processes that maintain the Earth. With this explosion in scientific output, a significant bottleneck has been the ability to rapidly disseminate new knowledge to peers and the public. Preprints have emerged as a tool that a growing number of microbiologists are using to overcome this bottleneck. Posting preprints can help to transparently recruit a more diverse pool of reviewers prior to submitting to a journal for formal peer-review. Although use of preprints is still limited in the biological sciences, early indications are that preprints are a robust tool that can complement and enhance peer-reviewed publications. As publishing moves to embrace advances in internet technology, there are many opportunities for preprints and peer-reviewed journals to coexist in the same ecosystem.
biorxiv microbiology 0-100-users 2017Evaluating the clinical validity of gene-disease associations an evidence-based framework developed by the Clinical Genome Resource, bioRxiv, 2017-02-23
AbstractWith advances in genomic sequencing technology, the number of reported gene-disease relationships has rapidly expanded. However, the evidence supporting these claims varies widely, confounding accurate evaluation of genomic variation in a clinical setting. Despite the critical need to differentiate clinically valid relationships from less well-substantiated relationships, standard guidelines for such evaluation do not currently exist. The NIH-funded Clinical Genome Resource (ClinGen) has developed a framework to define and evaluate the clinical validity of gene-disease pairs across a variety of Mendelian disorders. In this manuscript we describe a proposed framework to evaluate relevant genetic and experimental evidence supporting or contradicting a gene-disease relationship, and the subsequent validation of this framework using a set of representative gene-disease pairs. The framework provides a semi-quantitative measurement for the strength of evidence of a gene-disease relationship which correlates to a qualitative classification “Definitive”, “Strong”, “Moderate”, “Limited”, “No Reported Evidence” or “Conflicting Evidence.” Within the ClinGen structure, classifications derived using this framework are reviewed and confirmed or adjusted based on clinical expertise of appropriate disease experts. Detailed guidance for utilizing this framework and access to the curation interface is available on our website. This evidence-based, systematic method to assess the strength of gene-disease relationships will facilitate more knowledgeable utilization of genomic variants in clinical and research settings.
biorxiv genetics 0-100-users 2017Granatum a graphical single-cell RNA-Seq analysis pipeline for genomics scientists, bioRxiv, 2017-02-23
AbstractBackgroundSingle-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level.Computational methods to process scRNA-Seq have limited accessibility to bench scientists as they require significant amounts of bioinformatics skills.ResultsWe have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene filtering, geneexpression normalization, cell clustering, differential gene expression analysis, pathwayontology enrichment analysis, protein-networ interaction visualization, and pseudo-time cell series construction.ConclusionsGranatum enables broad adoption of scRNA-Seq technology by empowering the bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgarmiregroup.orggranatumapp>httpgarmiregroup.orggranatumapp<jatsext-link>
biorxiv bioinformatics 0-100-users 2017W2RAP a pipeline for high quality, robust assemblies of large complex genomes from short read data, bioRxiv, 2017-02-23
AbstractProducing high-quality whole-genome shotgun de novo assemblies from plant and animal species with large and complex genomes using low-cost short read sequencing technologies remains a challenge. But when the right sequencing data, with appropriate quality control, is assembled using approaches focused on robustness of the process rather than maximization of a single metric such as the usual contiguity estimators, good quality assemblies with informative value for comparative analyses can be produced. Here we present a complete method described from data generation and qc all the way up to scaffold of complex genomes using Illumina short reads and its application to data from plants and human datasets. We show how to use the w2rap pipeline following a metric-guided approach to produce cost-effective assemblies. The assemblies are highly accurate, provide good coverage of the genome and show good short range contiguity. Our pipeline has already enabled the rapid, cost-effective generation of de novo genome assemblies from large, polyploid crop species with a focus on comparative genomics.Availabilityw2rap is available under MIT license, with some subcomponents under GPL-licenses. A ready-to-run docker with all software pre-requisites and example data is also available.<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.combioinfologicsw2rap>httpgithub.combioinfologicsw2rap<jatsext-link><jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.combioinfologicsw2rap-contigger>httpgithub.combioinfologicsw2rap-contigger<jatsext-link>
biorxiv bioinformatics 0-100-users 2017