THINGS A database of 1,854 object concepts and more than 26,000 naturalistic object images, bioRxiv, 2019-02-11
In recent years, the use of a large number of object concepts and naturalistic object images has been growing enormously in cognitive neuroscience research. Classical databases of object concepts are based mostly on a manually-curated set of concepts. Further, databases of naturalistic object images typically consist of single images of objects cropped from their background, or a large number of uncontrolled naturalistic images of varying quality, requiring elaborate manual image curation. Here we provide a set of 1,854 diverse object concepts sampled systematically from concrete picturable and nameable nouns in the American English language. Using these object concepts, we conducted a large-scale web image search to compile a database of 26,107 high-quality naturalistic images of those objects, with 12 or more object images per concept and all images cropped to square size. Using crowdsourcing, we provide higher-level category membership for the 27 most common categories and validate them by relating them to representations in a semantic embedding derived from large text corpora. Finally, by feeding images through a deep convolutional neural network, we demonstrate that they exhibit high selectivity for different object concepts, while at the same time preserving variability of different object images within each concept. Together, the THINGS database provides a rich resource of object concepts and object images and offers a tool for both systematic and large-scale naturalistic research in the fields of psychology, neuroscience, and computer science.
biorxiv neuroscience 100-200-users 2019Tissue structure accelerates evolution premalignant sweeps precede neutral expansion, bioRxiv, 2019-02-11
Cancer has been hypothesized to be a caricature of the renewal process of the tissue of origin arising from (and maintained by) small subpopulations capable of continuous growth1. The strong influence of the tissue structure has been convincingly demonstrated in intestinal cancers where adenomas grow by the fission of stem-cell-maintained glands influenced by early expression of abnormal cell mobility in cancer progenitors2, 3. So-called “born to be bad” tumors arise from progenitors which may already possess the necessary driver mutations for malignancy4, 5 and metastasis6. These tumors subsequently evolve neutrally, thereby maximizing intratumoral heterogeneity and increasing the probability of therapeutic resistance. These findings have been nuanced by the advent of multi-region sequencing, which uses spatial and temporal patterns of genetic variation among competing tumor cell populations to shed light on the mode of tumor evolution (neutral or Darwinian) and also the tempo4, 7–11. Using a classic, well-studied model of tumor evolution (a passenger-driver mutation model12–16) we systematically alter spatial constraints and cell mixing rates to show how tissue structure influences functional (driver) mutations and genetic heterogeneity over time. This model approach explores a key mechanism behind both inter-patient and intratumoral tumor heterogeneity competition for space. Initial spatial constraints determine the emergent mode of evolution (Darwinian to neutral) without a change in cell-specific mutation rate or fitness effects. Driver acquisition during the Darwinian precancerous stage may be accelerated en route to neutral evolution by the combination of two factors spatial constraints and limited cellular mixing.
biorxiv cancer-biology 0-100-users 2019Socru Typing of genome level order and orientation in bacteria, bioRxiv, 2019-02-10
Genome rearrangements occur in bacteria between repeat sequences and impact growth and gene expression. Homologous recombination can occur between ribosomal operons, which are found in multiple copies in many bacteria. Inversion between indirect repeats and excisiontranslocation between direct repeats enable structural genome rearrangement. To identify what these rearrangements are by sequencing, reads of several thousand bases are required to span the ribosomal operons. With long read sequencing aiding the routine generation of complete bacterial assemblies, we have developed socru, a typing method for the order and orientation of genome fragments between ribosomal operons, defined against species-specific baselines. It allows for a single identifier to convey the order and orientation of genome level structure and 434 of the most common bacterial species are supported. Additionally, socru can be used to identify large scale misassemblies. Availability and implementation Socru is written in Python 3, runs on Linux and OSX systems and is available under the open source license GNU GPL 3 from httpsgithub.comquadram-institute-biosciencesocru.
biorxiv bioinformatics 0-100-users 2019Object Detection Networks and Augmented Reality for Cellular Detection in Fluorescence Microscopy Acquisition and Analysis, bioRxiv, 2019-02-09
AbstractIn this paper we demonstrate the application of object detection networks for the classification and localization of cells in fluorescence microscopy. We benchmark two leading object detection algorithms across multiple challenging 2-D microscopy datasets as well as develop and demonstrate an algorithm which can localize and image cells in 3-D, in real-time. Furthermore, we exploit the fast processing of these algorithms and develop a simple and effective Augmented Reality (AR) system for fluorescence microscopy systems. Object detection networks are well-known high performance networks famously applied to the task of identifying and localizing objects in photography images. Here we show their application and efficiency for localizing cells in fluorescence microscopy images. Object detection algorithms are typically trained on many thousands of images, which can be prohibitive within the biological sciences due to the cost of imaging and annotating large amounts of data. Through taking different cell types and assays as an example, we show that with some careful considerations it is possible to achieve very high performance with datasets with as few as 26 images present. Using our approach, it is possible for relatively non-skilled users to automate detection of cell classes with a variety of appearances and enable new avenues for automation of conventionally manual fluorescence microscopy acquisition pipelines.
biorxiv bioinformatics 0-100-users 2019Where do our graduates go? A toolkit for retrospective and ongoing career outcomes data collection for biomedical PhD students and postdoctoral scholars, bioRxiv, 2019-02-09
Universities are at long last undertaking efforts to collect and disseminate information about student career outcomes, after decades of calls to action. Organizations such as Rescuing Biomedical Research and Future of Research brought this issue to the forefront of graduate education, and the second Future of Biomedical Graduate and Postdoctoral Training conference (FOBGAPT2) featured the collection of career outcomes data in its final recommendations, published in this journal (Hitchcock et al., 2017). More recently, 26 institutions assembled as the Coalition for Next Generation Life Science, committing to ongoing collection and dissemination of career data for both graduate and postdoc alumni. A few individual institutions have shared snapshots of the data in peer-reviewed publications (Mathur et al., 2018; Silva, des Jarlais, Lindstaedt, Rotman, Watkins, 2016) and on websites. As more and more institutions take up this call to action, they will now be looking for tools, protocols, and best practices for ongoing career outcomes data collection, management, and dissemination. Here, we describe UCSF's experiences in conducting a retrospective study, and in institutionalizing a methodology for annual data collection and dissemination. We describe and share all tools we have developed, and we provide calculations of the time and resources required to accomplish both retrospective studies and annual updates. We also include broader recommendations for implementation at your own institutions, increasing the feasibility of this endeavor.
biorxiv scientific-communication-and-education 100-200-users 2019Characterising the loss-of-function impact of 5' untranslated region variants in whole genome sequence data from 15,708 individuals, bioRxiv, 2019-02-08
Upstream open reading frames (uORFs) are important tissue-specific cis-regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.
biorxiv genomics 200-500-users 2019