Population-genomic inference of the strength and timing of selection against gene flow, bioRxiv, 2016-09-02
AbstractThe interplay of divergent selection and gene flow is key to understanding how populations adapt to local environments and how new species form. Here, we use DNA polymorphism data and genome-wide variation in recombination rate to jointly infer the strength and timing of selection, as well as the baseline level of gene flow under various demographic scenarios. We model how divergent selection leads to a genome-wide negative correlation between recombination rate and genetic differentiation among populations. Our theory shows that the selection density, i.e. the selection coefficient per base pair, is a key parameter underlying this relationship. We then develop a procedure for parameter estimation that accounts for the confounding effect of background selection. Applying this method to two datasets from Mimulus guttatus, we infer a strong signal of adaptive divergence in the face of gene flow between populations growing on and off phytotoxic serpentine soils. However, the genome-wide intensity of this selection is not exceptional compared to what M. guttatus populations may typically experience when adapting to local conditions. We also find that selection against genome-wide introgression from the selfing sister species M. nasutus has acted to maintain a barrier between these two species over at least the last 250 ky. Our study provides a theoretical framework for linking genome-wide patterns of divergence and recombination with the underlying evolutionary mechanisms that drive this differentiation.
biorxiv evolutionary-biology 100-200-users 2016Paperfuge An ultra-low cost, hand-powered centrifuge inspired by the mechanics of a whirligig toy, bioRxiv, 2016-09-01
AbstractSample preparation, including separation of plasma from whole blood or isolation of parasites, is an unmet challenge in many point of care (POC) diagnostics and requires centrifugation as the first key step. From the context of global health applications, commercial centrifuges are expensive, bulky and electricity-powered, leading to a critical bottle-neck in the development of decentralized, electricity-free POC diagnostic devices. By uncovering the fundamental mechanics of an ancient whirligig toy (3300 B.C.E), we design an ultra-low cost (20 cents), light-weight (2 g), human-powered centrifuge that is made out of paper (“paperfuge”). To push the operating limits of this unconventional centrifuge, we present an experimentally-validated theoretical model that describes the paperfuge as a non-linear, non-conservative oscillator system. We use this model to inform our design process, achieving speeds of 125,000 rpm and equivalent centrifugal forces of 30,000 g, with theoretical limits predicting one million rpm. We harness these speeds to separate pure plasma in less than 1.5 minutes and isolate malaria parasites in 15 minutes from whole human blood. By expanding the materials used, we implement centrifugal microfluidics using PDMS, plastic and 3D-printed devices, ultimately opening up new opportunities for electricity-free POC diagnostics, especially in resource-poor settings.
biorxiv bioengineering 200-500-users 2016Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature, bioRxiv, 2016-08-26
AbstractWe have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years. The reported median effect size was d=0.93 (inter-quartile range 0.64-1.46) for nominally statistically significant results and d=0.24 (0.11-0.42) for non-significant results. Median power to detect small, medium and large effects was 0.12, 0.44 and 0.73, reflecting no improvement through the past half-century. Power was lowest for cognitive neuroscience journals. 14% of papers reported some statistically significant results, although the respective F statistic and degrees of freedom proved that these were non-significant; p value errors positively correlated with journal impact factors. False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience.
biorxiv neuroscience 500+-users 2016Canu scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, bioRxiv, 2016-08-25
AbstractLong-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
biorxiv bioinformatics 100-200-users 2016DeepAD Alzheimer’s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI, bioRxiv, 2016-08-22
1AbstractTo extract patterns from neuroimaging data, various statistical methods and machine learning algorithms have been explored for the diagnosis of Alzheimer’s disease among older adults in both clinical and research applications; however, distinguishing between Alzheimer’s and healthy brain data has been challenging in older adults (age > 75) due to highly similar patterns of brain atrophy and image intensities. Recently, cutting-edge deep learning technologies have rapidly expanded into numerous fields, including medical image analysis. This paper outlines state-of-the-art deep learning-based pipelines employed to distinguish Alzheimer’s magnetic resonance imaging (MRI) and functional MRI (fMRI) from normal healthy control data for a given age group. Using these pipelines, which were executed on a GPU-based high-performance computing platform, the data were strictly and carefully preprocessed. Next, scale- and shift-invariant low- to high-level features were obtained from a high volume of training images using convolutional neural network (CNN) architecture. In this study, fMRI data were used for the first time in deep learning applications for the purposes of medical image analysis and Alzheimer’s disease prediction. These proposed and implemented pipelines, which demonstrate a significant improvement in classification output over other studies, resulted in high and reproducible accuracy rates of 99.9% and 98.84% for the fMRI and MRI pipelines, respectively. Additionally, for clinical purposes, subject-level classification was performed, resulting in an average accuracy rate of 94.32% and 97.88% for the fMRI and MRI pipelines, respectively. Finally, a decision making algorithm designed for the subject-level classification improved the rate to 97.77% for fMRI and 100% for MRI pipelines.
biorxiv bioinformatics 0-100-users 2016Deep Learning and Association Rule Mining for Predicting Drug Response in Cancer. A Personalised Medicine Approach, bioRxiv, 2016-08-20
ABSTRACTA major challenge in cancer treatment is predicting the clinical response to anti-cancer drugs for each individual patient. For complex diseases such as cancer, characterized by high inter-patient variance, the implementation of precision medicine approaches is dependent upon understanding the pathological processes at the molecular level. While the “omics” era provides unique opportunities to dissect the molecular features of diseases, the ability to utilize it in targeted therapeutic efforts is hindered by both the massive size and diverse nature of the “omics” data. Recent advances with Deep Learning Neural Networks (DLNNs), suggests that DLNN could be trained on large data sets to efficiently predict therapeutic responses in cancer treatment. We present the application of Association Rule Mining combined with DLNNs for the analysis of high-throughput molecular profiles of 1001 cancer cell lines, in order to extract cancer-specific signatures in the form of easily interpretable rules and use these rules as input to predict pharmacological responses to a large number of anti-cancer drugs. The proposed algorithm outperformed Random Forests (RF) and Bayesian Multitask Multiple Kernel Learning (BMMKL) classification which currently represent the state-of-the-art in drug-response prediction. Moreover, the in silico pipeline presented, introduces a novel strategy for identifying potential therapeutic targets, as well as possible drug combinations with high therapeutic potential. For the first time, we demonstrate that DLNNs trained on a large pharmacogenomics data-set can effectively predict the therapeutic response of specific drugs in different cancer types. These findings serve as a proof of concept for the application of DLNNs to predict therapeutic responsiveness, a milestone in precision medicine.
biorxiv bioinformatics 100-200-users 2016