immuneSIM tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking, bioRxiv, 2019-09-07

AbstractSummaryB- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full length variable region immune receptor sequences. ImmuneSIM enables the tuning of the immune receptor features (i) species and chain type (BCR, TCR, single, paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation, and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis, and machine learning methods for motif detection.AvailabilityThe package is available via <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comGreiffLabimmuneSIM>httpsgithub.comGreiffLabimmuneSIM<jatsext-link> and will also be available at CRAN (submitted). The documentation is hosted at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsimmuneSIM.readthedocs.io>httpsimmuneSIM.readthedocs.io<jatsext-link>.Contactvictor.greiff@medisin.uio.no, sai.reddy@ethz.ch

biorxiv bioinformatics 100-200-users 2019

Deep learning for brains? Different linear and nonlinear scaling in UK Biobank brain images vs. machine-learning datasets, bioRxiv, 2019-09-06

AbstractIn recent years, deep learning has unlocked unprecedented success in various domains, especially in image, text, and speech processing. These breakthroughs may hold promise for neuroscience and especially for brain-imaging investigators who start to analyze thousands of participants. However, deep learning is only beneficial if the data have nonlinear relationships and if they are exploitable at currently available sample sizes. We systematically profiled the performance of deep models, kernel models, and linear models as a function of sample size on UK Biobank brain images against established machine learning references. On MNIST and Zalando Fashion, prediction accuracy consistently improved when escalating from linear models to shallow-nonlinear models, and further improved when switching to deep-nonlinear models. The more observations were available for model training, the greater the performance gain we saw. In contrast, using structural or functional brain scans, simple linear models performed on par with more complex, highly parameterized models in agesex prediction across increasing sample sizes. In fact, linear models kept improving as the sample size approached ∼10,000 participants. Our results indicate that the increase in performance of linear models with additional data does not saturate at the limit of current feasibility. Yet, nonlinearities of common brain scans remain largely inaccessible to both kernel and deep learning methods at any examined scale.

biorxiv bioinformatics 100-200-users 2019

GeneWalk identifies relevant gene functions for a biological context using network representation learning, bioRxiv, 2019-09-05

AbstractThe primary bottleneck in high-throughput genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Existing methods such as Gene Ontology (GO) enrichment analysis provide insight at the gene set level. For individual genes, GO annotations are static and biological context can only be added by manual literature searches. Here, we introduce GeneWalk (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.comchurchmanlabgenewalk>github.comchurchmanlabgenewalk<jatsext-link>), a method that identifies individual genes and their relevant functions under a particular experimental condition. After automatic assembly of an experiment-specific gene regulatory network, GeneWalk quantifies the similarity between vector representations of each gene and its GO annotations through representation learning, yielding annotation significance scores that reflect their functional relevance for the experimental context. We demonstrate the use of GeneWalk analysis of RNA-seq and nascent transcriptome (NET-seq) data from human cells and mouse brains, validating the methodology. By performing gene- and condition-specific functional analysis that converts a list of genes into data-driven hypotheses, GeneWalk accelerates the interpretation of high-throughput genetics experiments.

biorxiv bioinformatics 200-500-users 2019

Robustness and applicability of functional genomics tools on scRNA-seq data, bioRxiv, 2019-09-01

AbstractMany tools have been developed to extract functional and mechanistic insight from bulk transcriptome profiling data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events, low library sizes and a comparatively large number of samplescells. It is thus not clear if functional genomics tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. To address this question, we performed benchmark studies on in silico and in vitro single-cell RNA-seq data. We included the bulk-RNA tools PROGENy, GO enrichment and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compared them against the tools AUCell and metaVIPER, designed for scRNA-seq. For the in silico study we simulated single cells from TFpathway perturbation bulk RNA-seq experiments. Our simulation strategy guarantees that the information of the original perturbation is preserved while resembling the characteristics of scRNA-seq data. We complemented the in silico data with in vitro scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on both the simulated and real data revealed comparable performance to the original bulk data. Additionally, we showed that the TF and pathway activities preserve cell-type specific variability by analysing a mixture sample sequenced with 13 scRNA-seq different protocols. Our analyses suggest that bulk functional genomics tools can be applied to scRNA-seq data, outperforming dedicated single cell tools. Furthermore we provide a benchmark for further methods development by the community.

biorxiv bioinformatics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo