Content-Aware Image Restoration Pushing the Limits of Fluorescence Microscopy, bioRxiv, 2017-12-20
Fluorescence microscopy is a key driver of discoveries in the life-sciences, with observable phenomena being limited by the optics of the microscope, the chemistry of the fluorophores, and the maximum photon exposure tolerated by the sample. These limits necessitate trade-offs between imaging speed, spatial resolution, light exposure, and imaging depth. In this work we show how image restoration based on deep learning extends the range of biological phenomena observable by microscopy. On seven concrete examples we demonstrate how microscopy images can be restored even if 60-fold fewer photons are used during acquisition, how near isotropic resolution can be achieved with up to 10-fold under-sampling along the axial direction, and how tubular and granular structures smaller than the diffraction limit can be resolved at 20-times higher frame-rates compared to state-of-the-art methods. All developed image restoration methods are freely available as open source software in Python, FIJI, and KNIME.
biorxiv bioinformatics 500+-users 2017Real-time search of all bacterial and viral genomic data, bioRxiv, 2017-12-16
Genome sequencing of pathogens is now ubiquitous in microbiology, and the sequence archives are effectively no longer searchable for arbitrary sequences. Furthermore, the exponential increase of these archives is likely to be further spurred by automated diagnostics. To unlock their use for scientific research and real-time surveillance we have combined knowledge about bacterial genetic variation with ideas used in web-search, to build a DNA search engine for microbial data that can grow incrementally. We indexed the complete global corpus of bacterial and viral whole genome sequence data (447,833 genomes), using four orders of magnitude less storage than previous methods. The method allows future scaling to millions of genomes. This renders the global archive accessible to sequence search, which we demonstrate with three applications ultra-fast search for resistance genes MCR1-3, analysis of host-range for 2827 plasmids, and quantification of the rise of antibiotic resistance prevalence in the sequence archives.
biorxiv bioinformatics 500+-users 2017Female grant applicants are equally successful when peer reviewers assess the science, but not when they assess the scientist, bioRxiv, 2017-12-13
ABSTRACTBackgroundPrevious research shows that men often receive more research funding than women, but does not provide empirical evidence as to why this occurs. In 2014, the Canadian Institutes of Health Research (CIHR) created a natural experiment by dividing all investigator-initiated funding into two new grant programs one with and one without an explicit review focus on the caliber of the principal investigator.MethodsWe analyzed application success among 23,918 grant applications from 7,093 unique principal investigators in a 5-year natural experiment across all investigator-initiated CIHR grant programs in 2011-2016. We used Generalized Estimating Equations to account for multiple applications by the same applicant and an interaction term between each principal investigator’s self-reported sex and grant programs to compare success rates between male and female applicants under different review criteria.ResultsThe overall grant success rate across all competitions was 15.8%. After adjusting for age and research domain, the predicted probability of funding success in traditional programs was 0.9 percentage points higher for male than for female principal investigators (OR 0.934, 95% CI 0.854-1.022). In the new program focused on the proposed science, the gap was 0.9 percentage points in favour of male principal investigators (OR 0.998, 95% CI 0.794-1.229). In the new program with an explicit review focus on the caliber of the principal investigator, the gap was 4.0 percentage points in favour of male principal investigators (OR 0.705, 95% CI 0.519- 0.960).InterpretationThis study suggests gender gaps in grant funding are attributable to less favourable assessments of women as principal investigators, not differences in assessments of the quality of science led by women. We propose ways for funders to avoid allowing gender bias to influence research funding.FundingThis study was unfunded.
biorxiv scientific-communication-and-education 500+-users 2017Bioconda A sustainable and comprehensive software distribution for the life sciences, bioRxiv, 2017-10-23
AbstractWe present Bioconda (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbioconda.github.io>httpsbioconda.github.io<jatsext-link>), a distribution of bioinformatics software for the lightweight, multiplatform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.
biorxiv bioinformatics 500+-users 2017Accurate Genomic Prediction Of Human Height, bioRxiv, 2017-09-19
AbstractWe construct genomic predictors for heritable and extremely complex human quan-titative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
biorxiv genomics 500+-users 2017Major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”, bioRxiv, 2017-09-07
SummaryGenetic privacy is an area of active research. While it is important to identify new risks, it is equally crucial to supply policymakers with accurate information based on scientific evidence. Recently, Lippert et al. (PNAS, 2017) investigated the status of genetic privacy using trait-predictions from whole genome sequencing. The authors sequenced a cohort of about 1000 individuals and collected a range of demographic, visible, and digital traits such as age, sex, height, face morphology, and a voice signature. They attempted to use the genetic features in order to predict those traits and re-identify the individuals from small pool using the trait predictions. Here, I report major flaws in the Lippert et al. manuscript. In short, the authors’ technique performs similarly to a simple baseline procedure, does not utilize the power of whole genome markers, uses technically wrong metrics, and finally does not really identify anyone.
biorxiv genomics 500+-users 2017