Alevin efficiently estimates accurate gene abundances from dscRNA-seq data, bioRxiv, 2018-06-01
AbstractWe introduce alevin, a fast end-to-end pipeline to process droplet-based single cell RNA sequencing data, which performs cell barcode detection, read mapping, unique molecular identifier deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads, and improves the accuracy of gene abundance estimates.
biorxiv bioinformatics 100-200-users 2018Antimicrobial exposure in sexual networks drives divergent evolution in modern gonococci, bioRxiv, 2018-05-31
AbstractThe sexually transmitted pathogen Neisseria gonorrhoeae is regarded as being on the way to becoming an untreatable superbug. Despite its clinical importance, little is known about its emergence and evolution, and how this corresponds with the introduction of antimicrobials. We present a genome-based phylogeographic analysis of 419 gonococcal isolates from across the globe. Results indicate that modern gonococci originated in Europe or Africa as late as the 16thcentury and subsequently disseminated globally. We provide evidence that the modern gonococcal population has been shaped by antimicrobial treatment of sexually transmitted and other infections, leading to the emergence of two major lineages with different evolutionary strategies. The well-described multi-resistant lineage is associated with high rates of homologous recombination and infection in high-risk sexual networks where antimicrobial treatment is frequent. A second, multi-susceptible lineage associated with heterosexual networks, where asymptomatic infection is more common, was also identified, with potential implications for infection control.
biorxiv genomics 0-100-users 2018Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images, bioRxiv, 2018-05-31
Identifying nuclei is often a critical first step in analyzing microscopy images of cells, and classical image processing algorithms are most commonly used for this task. Recent developments in deep learning can yield superior accuracy, but typical evaluation metrics for nucleus segmentation do not satisfactorily capture error modes that are relevant in cellular images. We present an evaluation framework to measure accuracy, types of errors, and computational efficiency; and use it to compare deep learning strategies and classical approaches. We publicly release a set of 23,165 manually annotated nuclei and source code to reproduce experiments and run the proposed evaluation methodology. Our evaluation framework shows that deep learning improves accuracy and can reduce the number of biologically relevant errors by half.
biorxiv bioinformatics 0-100-users 2018Purification of Cross-linked RNA-Protein Complexes by Phenol-Toluol Extraction, bioRxiv, 2018-05-30
Recent methodological advances allowed the identification of an increasing number of RNA-binding proteins (RBPs) and their RNA-binding sites. Most of those methods rely, however, on capturing proteins associated to polyadenylated RNAs which neglects RBPs bound to non-adenylate RNA classes (tRNA, rRNA, pre-mRNA) as well as the vast majority of species that lack poly-A tails in their mRNAs (including all archea and bacteria). To overcome these limitations, we have developed a novel protocol, Phenol Toluol extraction (PTex), that does not rely on a specific RNA sequence or motif for isolation of cross-linked ribonucleoproteins (RNPs), but rather purifies them based entirely on their physicochemical properties. PTex captures RBPs that bind to RNA as short as 30 nt, RNPs directly from animal tissue and can be used to simplify complex workflows such as PAR-CLIP. Finally, we provide a first global RNA-bound proteome of human HEK293 cells and Salmonella Typhimurium as a bacterial species.
biorxiv molecular-biology 0-100-users 2018Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise, bioRxiv, 2018-05-28
AbstractWe assembled the sequences from 9,795 RNA sequencing experiments, collected from 31 human tissues and hundreds of subjects as part of the GTEx project, to create a new, comprehensive catalog of human genes and transcripts. The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Our expanded gene list includes 4,998 novel genes (1,178 coding and 3,819 noncoding) and 97,511 novel splice variants of protein-coding genes as compared to the most recent human gene catalogs. We detected over 30 million additional transcripts at more than 650,000 sites, nearly all of which are likely to be nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells.
biorxiv genomics 500+-users 2018Fast animal pose estimation using deep neural networks, bioRxiv, 2018-05-25
AbstractRecent work quantifying postural dynamics has attempted to define the repertoire of behaviors performed by an animal. However, a major drawback to these techniques has been their reliance on dimensionality reduction of images which destroys information about which parts of the body are used in each behavior. To address this issue, we introduce a deep learning-based method for pose estimation, LEAP (LEAP Estimates Animal Pose). LEAP automatically predicts the positions of animal body parts using a deep convolutional neural network with as little as 10 frames of labeled data for training. This framework consists of a graphical interface for interactive labeling of body parts and software for training the network and fast prediction on new data (1 hr to train, 185 Hz predictions). We validate LEAP using videos of freely behaving fruit flies (Drosophila melanogaster) and track 32 distinct points on the body to fully describe the pose of the head, body, wings, and legs with an error rate of <3% of the animal’s body length. We recapitulate a number of reported findings on insect gait dynamics and show LEAP’s applicability as the first step in unsupervised behavioral classification. Finally, we extend the method to more challenging imaging situations (pairs of flies moving on a mesh-like background) and movies from freely moving mice (Mus musculus) where we track the full conformation of the head, body, and limbs.
biorxiv animal-behavior-and-cognition 500+-users 2018