A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data, bioRxiv, 2018-10-17
AbstractGenome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a ‘glocal’ alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxinrimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.
biorxiv bioinformatics 100-200-users 2018Comparison of single-cell whole-genome amplification strategies, bioRxiv, 2018-10-16
Single-cell genomics is an alluring area that holds the potential to change the way we understand cell populations. Due to the small amount of DNA within a single cell, whole-genome amplification becomes a mandatory step in many single-cell applications. Unfortunately, single-cell whole-genome amplification (scWGA) strategies suffer from several technical biases that complicate the posterior interpretation of the data. Here we compared the performance of six different scWGA methods (GenomiPhi, REPLIg, TruePrime, Ampli1, MALBAC, and PicoPLEX) after amplifying and low-pass sequencing the complete genome of 230 healthytumoral human cells. Overall, REPLIg outperformed competing methods regarding DNA yield, amplicon size, amplification breadth, amplification uniformity –being the only method with a random amplification bias–, and false single-nucleotide variant calls. On the other hand, non-MDA methods, and in particular Ampli1, showed less allelic imbalance and ADO, more reliable copy-number profiles and less chimeric amplicons. While no single scWGA method showed optimal performance for every aspect, they clearly have distinct advantages. Our results provide a convenient guide for selecting a scWGA method depending on the question of interest while revealing relevant weaknesses that should be considered during the analysis and interpretation of single-cell sequencing data.
biorxiv genomics 100-200-users 2018A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex, bioRxiv, 2018-10-13
AbstractHow the neocortex works is a mystery. In this paper we propose a novel framework for understanding its function. Grid cells are neurons in the entorhinal cortex that represent the location of an animal in its environment. Recent evidence suggests that grid cell-like neurons may also be present in the neocortex. We propose that grid cells exist throughout the neocortex, in every region and in every cortical column. They define a location-based framework for how the neocortex functions. Whereas grid cells in the entorhinal cortex represent the location of one thing, the body relative to its environment, we propose that cortical grid cells simultaneously represent the location of many things. Cortical columns in somatosensory cortex track the location of tactile features relative to the object being touched and cortical columns in visual cortex track the location of visual features relative to the object being viewed. We propose that mechanisms in the entorhinal cortex and hippocampus that evolved for learning the structure of environments are now used by the neocortex to learn the structure of objects. Having a representation of location in each cortical column suggests mechanisms for how the neocortex represents object compositionality and object behaviors. It leads to the hypothesis that every part of the neocortex learns complete models of objects and that there are many models of each object distributed throughout the neocortex. The similarity of circuitry observed in all cortical regions is strong evidence that even high-level cognitive tasks are learned and represented in a location-based framework.
biorxiv neuroscience 100-200-users 2018Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease, bioRxiv, 2018-10-13
AbstractA central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features, and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. We show that this map is informative about both human evolution and disease.
biorxiv genomics 100-200-users 2018In situ and high-resolution Cryo-EM structure of the Type VI secretion membrane complex, bioRxiv, 2018-10-13
AbstractBacteria have evolved macromolecular machineries that secrete effectors and toxins to survive and thrive in diverse environments. The type VI secretion system (T6SS) is a contractile machine that is related to Myoviridae phages. The T6SS is composed of a baseplate that contains a spike onto which an inner tube is built, surrounded by a contractile sheath. Unlike phages that are released to and act in the extracellular medium, the T6SS is an intracellular machine inserted in the bacterial membranes by a trans-envelope complex. This membrane complex (MC) comprises three proteins TssJ, TssL and TssM. We previously reported the low-resolution negative stain electron microscopy structure of the enteroaggregative Escherichia coli MC and proposed a rotational 5-fold symmetry with a TssJTssLTssM stoichiometry of 222. Here, cryo-electron tomography analysis of the T6SS MC confirmed the 5-fold symmetry in situ and identified the regions of the structure that insert into the bacterial membranes. A high resolution model obtained by single particle cryo-electron microscopy reveals its global architecture and highlights new features five additional copies of TssJ, yielding a TssJTssLTssM stoichiometry of 322, a 11-residue loop in TssM, protruding inside the lumen of the MC and constituting a functionally important periplasmic gate, and hinge regions. Based on these data, we revisit the model on the mechanism of action of the MC during T6SS assembly and function.
biorxiv microbiology 100-200-users 2018Measures of neural similarity, bioRxiv, 2018-10-12
One fundamental question is what makes two brain states similar. For example, what makes the activity in visual cortex elicited from viewing a robin similar to a sparrow? One common assumption in fMRI analysis is that neural similarity is described by Pearson correlation. However, there are a host of other possibilities, including Minkowski and Mahalanobis measures, with each differing in its mathematical, theoretical, neural computational assumptions. Moreover, the operable measures may vary across brain regions and tasks. Here, we evaluated which of several competing similarity measures best captured neural similarity. Our technique uses a decoding approach to assess the information present in a brain region and the similarity measures that best correspond to the classifier's confusion matrix are preferred. Across two published fMRI datasets, we found the preferred neural similarity measures were common across brain regions, but differed across tasks. Moreover, Pearson correlation was consistently surpassed by alternatives.
biorxiv neuroscience 100-200-users 2018