THINGS A database of 1,854 object concepts and more than 26,000 naturalistic object images, bioRxiv, 2019-02-11

In recent years, the use of a large number of object concepts and naturalistic object images has been growing enormously in cognitive neuroscience research. Classical databases of object concepts are based mostly on a manually-curated set of concepts. Further, databases of naturalistic object images typically consist of single images of objects cropped from their background, or a large number of uncontrolled naturalistic images of varying quality, requiring elaborate manual image curation. Here we provide a set of 1,854 diverse object concepts sampled systematically from concrete picturable and nameable nouns in the American English language. Using these object concepts, we conducted a large-scale web image search to compile a database of 26,107 high-quality naturalistic images of those objects, with 12 or more object images per concept and all images cropped to square size. Using crowdsourcing, we provide higher-level category membership for the 27 most common categories and validate them by relating them to representations in a semantic embedding derived from large text corpora. Finally, by feeding images through a deep convolutional neural network, we demonstrate that they exhibit high selectivity for different object concepts, while at the same time preserving variability of different object images within each concept. Together, the THINGS database provides a rich resource of object concepts and object images and offers a tool for both systematic and large-scale naturalistic research in the fields of psychology, neuroscience, and computer science.

biorxiv neuroscience 100-200-users 2019

Where do our graduates go? A toolkit for retrospective and ongoing career outcomes data collection for biomedical PhD students and postdoctoral scholars, bioRxiv, 2019-02-09

Universities are at long last undertaking efforts to collect and disseminate information about student career outcomes, after decades of calls to action. Organizations such as Rescuing Biomedical Research and Future of Research brought this issue to the forefront of graduate education, and the second Future of Biomedical Graduate and Postdoctoral Training conference (FOBGAPT2) featured the collection of career outcomes data in its final recommendations, published in this journal (Hitchcock et al., 2017). More recently, 26 institutions assembled as the Coalition for Next Generation Life Science, committing to ongoing collection and dissemination of career data for both graduate and postdoc alumni. A few individual institutions have shared snapshots of the data in peer-reviewed publications (Mathur et al., 2018; Silva, des Jarlais, Lindstaedt, Rotman, Watkins, 2016) and on websites. As more and more institutions take up this call to action, they will now be looking for tools, protocols, and best practices for ongoing career outcomes data collection, management, and dissemination. Here, we describe UCSF's experiences in conducting a retrospective study, and in institutionalizing a methodology for annual data collection and dissemination. We describe and share all tools we have developed, and we provide calculations of the time and resources required to accomplish both retrospective studies and annual updates. We also include broader recommendations for implementation at your own institutions, increasing the feasibility of this endeavor.

biorxiv scientific-communication-and-education 100-200-users 2019

Performance of neural network basecalling tools for Oxford Nanopore sequencing, bioRxiv, 2019-02-08

AbstractBackgroundBasecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rules consensus basecalls in an assembly. We also investigate some additional aspects of basecalling training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.ResultsTraining basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but prepolish accuracy does have an effect on post-polish accuracy.ConclusionsBasecalling accuracy has seen significant improvements over the last two years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network andor training data from the same species.

biorxiv bioinformatics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo