immuneSIM tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking, bioRxiv, 2019-09-07

AbstractSummaryB- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full length variable region immune receptor sequences. ImmuneSIM enables the tuning of the immune receptor features (i) species and chain type (BCR, TCR, single, paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation, and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis, and machine learning methods for motif detection.AvailabilityThe package is available via <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comGreiffLabimmuneSIM>httpsgithub.comGreiffLabimmuneSIM<jatsext-link> and will also be available at CRAN (submitted). The documentation is hosted at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsimmuneSIM.readthedocs.io>httpsimmuneSIM.readthedocs.io<jatsext-link>.Contactvictor.greiff@medisin.uio.no, sai.reddy@ethz.ch

biorxiv bioinformatics 100-200-users 2019

Deep learning for brains? Different linear and nonlinear scaling in UK Biobank brain images vs. machine-learning datasets, bioRxiv, 2019-09-06

AbstractIn recent years, deep learning has unlocked unprecedented success in various domains, especially in image, text, and speech processing. These breakthroughs may hold promise for neuroscience and especially for brain-imaging investigators who start to analyze thousands of participants. However, deep learning is only beneficial if the data have nonlinear relationships and if they are exploitable at currently available sample sizes. We systematically profiled the performance of deep models, kernel models, and linear models as a function of sample size on UK Biobank brain images against established machine learning references. On MNIST and Zalando Fashion, prediction accuracy consistently improved when escalating from linear models to shallow-nonlinear models, and further improved when switching to deep-nonlinear models. The more observations were available for model training, the greater the performance gain we saw. In contrast, using structural or functional brain scans, simple linear models performed on par with more complex, highly parameterized models in agesex prediction across increasing sample sizes. In fact, linear models kept improving as the sample size approached ∼10,000 participants. Our results indicate that the increase in performance of linear models with additional data does not saturate at the limit of current feasibility. Yet, nonlinearities of common brain scans remain largely inaccessible to both kernel and deep learning methods at any examined scale.

biorxiv bioinformatics 100-200-users 2019

Single Cell RNA-seq reveals ectopic and aberrant lung resident cell populations in Idiopathic Pulmonary Fibrosis, bioRxiv, 2019-09-06

AbstractWe provide a single cell atlas of Idiopathic Pulmonary Fibrosis (IPF), a fatal interstitial lung disease, focusing on resident lung cell populations. By profiling 312,928 cells from 32 IPF, 29 healthy control and 18 chronic obstructive pulmonary disease (COPD) lungs, we demonstrate that IPF is characterized by changes in discrete subpopulations of cells in the three major parenchymal compartments the epithelium, endothelium and stroma. Among epithelial cells, we identify a novel population of IPF enriched aberrant basaloid cells that co-express basal epithelial markers, mesenchymal markers, senescence markers, developmental transcription factors and are located at the edge of myofibroblast foci in the IPF lung. Among vascular endothelial cells in the in IPF lung parenchyma we identify an expanded cell population transcriptomically identical to vascular endothelial cells normally restricted to the bronchial circulation. We confirm the presence of both populations by immunohistochemistry and independent datasets. Among stromal cells we identify fibroblasts and myofibroblasts in both control and IPF lungs and leverage manifold-based algorithms diffusion maps and diffusion pseudotime to infer the origins of the activated IPF myofibroblast. Our work provides a comprehensive catalogue of the aberrant cellular transcriptional programs in IPF, demonstrates a new framework for analyzing complex disease with scRNAseq, and provides the largest lung disease single-cell atlas to date.

biorxiv genomics 0-100-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo