centroFlye Assembling Centromeres with Long Error-Prone Reads, bioRxiv, 2019-09-17
AbstractAlthough variations in centromeres have been linked to cancer and infertility, centromeres still represent the “dark matter of the human genome” and remain an enigma for both biomedical and evolutionary studies. Since centromeres have withstood all previous attempts to develop an automated tool for their assembly and since their assembly using short reads is viewed as intractable, recent efforts attempted to manually assemble centromeres using long error-prone reads. We describe the centroFlye algorithm for centromere assembly using long error-prone reads, apply it for assembling the human X centromere, and use the constructed assembly to gain insights into centromere evolution. Our analysis reveals putative breakpoints in the previous manual reconstruction of the human X centromere and opens a possibility to automatically close the remaining multi-megabase gaps in the reference human genome.
biorxiv bioinformatics 100-200-users 2019The Tolman-Eichenbaum Machine Unifying space and relational memory through generalisation in the hippocampal formation, bioRxiv, 2019-09-17
The hippocampal-entorhinal system is important for spatial and relational memory tasks. We formally link these domains; provide a mechanistic understanding of the hippocampal role in generalisation; and offer unifying principles underlying many entorhinal and hippocampal cell-types. We propose medial entorhinal cells form a basis describing structural knowledge, and hippocampal cells link this basis with sensory representations. Adopting these principles, we introduce the Tolman-Eichenbaum machine (TEM). After learning, TEM entorhinal cells include grid, band, border and object-vector cells. Hippocampal cells include place and landmark cells, remapping between environments. Crucially, TEM also predicts empirically recorded representations in complex non-spatial tasks. TEM predicts hippocampal remapping is not random as previously believed. Rather structural knowledge is preserved across environments. We confirm this in simultaneously recorded place and grid cells.One Sentence SummarySimple principles of representation and generalisation unify spatial and non-spatial accounts of hippocampus and explain many cell representations.
biorxiv neuroscience 100-200-users 2019Genetic “General Intelligence,” Objectively Determined and Measured, bioRxiv, 2019-09-13
AbstractIt has been known for 125 years that, in humans, diverse cognitive traits are positively intercorrelated; this forms the basis for the general factor of intelligence (g). We directly test for a genetic basis for g using data from seven different cognitive tests (N = 11,263 to N = 331,679) and genome-wide autosomal single nucleotide polymorphisms. A genetic g factor accounts for 58.4% (SE = 4.8%) of the genetic variance in the cognitive traits, with trait-specific genetic factors accounting for the remaining 41.6%. We distill genetic loci broadly relevant for many cognitive traits (g) from loci associated with only individual cognitive traits. These results elucidate the etiological basis for a long-known yet poorly-understood phenomenon, revealing a fundamental dimension of genetic sharing across diverse cognitive traits.
biorxiv genetics 100-200-users 2019A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding, bioRxiv, 2019-09-10
Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. It has been a long-standing question whether antibody-antigen binding is predictable. A fundamental premise for the predictability of paratope-epitope interactions is the existence of structural units that are universally shared among antibody-antigen binding complexes. Here, we screened the largest available set of non-redundant antibody-antigen structures for binding patterns and identified structural interaction motifs, which together compose a vocabulary of paratope-epitope interactions that is universally shared among investigated antibody-antigen structures. The vocabulary (i) is compact, less than 104 motifs, (ii) is immunity-specific and distinct from non-immune protein-protein interactions, (iii) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs, and (iv) enables the machine learnability of paratope-epitope interactions. Collectively, our results demonstrate the predictability of antibody-antigen binding.
biorxiv immunology 100-200-users 2019Direct-fit to nature an evolutionary perspective on biological (and artificial) neural networks, bioRxiv, 2019-09-10
AbstractEvolution is a blind fitting process by which organisms, over generations, adapt to the niches of an ever-changing environment. Does the mammalian brain use similar brute-force fitting processes to learn how to perceive and act upon the world? Recent advances in training deep neural networks has exposed the power of optimizing millions of synaptic weights to map millions of observations along ecologically relevant objective functions. This class of models has dramatically outstripped simpler, more intuitive models, operating robustly in real-life contexts spanning perception, language, and action coordination. These models do not learn an explicit, human-interpretable representation of the underlying structure of the data; rather, they use local computations to interpolate over task-relevant manifolds in a high-dimensional parameter space. Furthermore, counterintuitively, over-parameterized models, similarly to evolutionary processes, can be simple and parsimonious as they provide a versatile, robust solution for learning a diverse set of functions. In contrast to traditional scientific models, where the ultimate goal is interpretability, over-parameterized models eschew interpretability in favor of solving real-life problems or tasks. We contend that over-parameterized blind fitting presents a radical challenge to many of the underlying assumptions and practices in computational neuroscience and cognitive psychology. At the same time, this shift in perspective informs longstanding debates and establishes unexpected links with evolution, ecological psychology, and artificial life.
biorxiv neuroscience 100-200-users 2019immuneSIM tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking, bioRxiv, 2019-09-07
AbstractSummaryB- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full length variable region immune receptor sequences. ImmuneSIM enables the tuning of the immune receptor features (i) species and chain type (BCR, TCR, single, paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation, and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis, and machine learning methods for motif detection.AvailabilityThe package is available via <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comGreiffLabimmuneSIM>httpsgithub.comGreiffLabimmuneSIM<jatsext-link> and will also be available at CRAN (submitted). The documentation is hosted at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsimmuneSIM.readthedocs.io>httpsimmuneSIM.readthedocs.io<jatsext-link>.Contactvictor.greiff@medisin.uio.no, sai.reddy@ethz.ch
biorxiv bioinformatics 100-200-users 2019