Identifying loci under positive selection in complex population histories, bioRxiv, 2018-10-26
AbstractDetailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method—called Graph-aware Retrieval of Selective Sweeps (GRoSS)—has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in specific bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.
biorxiv evolutionary-biology 100-200-users 2018Proximity RNA labeling by APEX-Seq Reveals the Organization of Translation Initiation Complexes and Repressive RNA Granules, bioRxiv, 2018-10-26
AbstractDiverse ribonucleoprotein complexes control messenger RNA processing, translation, and decay. Transcripts in these complexes localize to specific regions of the cell and can condense into non-membrane-bound structures such as stress granules. It has proven challenging to map the RNA composition of these large and dynamic structures, however. We therefore developed an RNA proximity labeling technique, APEX-Seq, which uses the ascorbate peroxidase APEX2 to probe the spatial organization of the transcriptome. We show that APEX-Seq can resolve the localization of RNAs within the cell and determine their enrichment or depletion near key RNA-binding proteins. Matching the spatial transcriptome, as revealed by APEX-Seq, with the spatial proteome determined by APEX-mass spectrometry (APEX-MS) provides new insights into the organization of translation initiation complexes on active mRNAs, as well as exposing unanticipated complexity in stress granule composition, and provides a powerful and general approach to explore the spatial environment of macromolecules.
biorxiv genomics 100-200-users 2018The art of using t-SNE for single-cell transcriptomics, bioRxiv, 2018-10-26
AbstractSingle-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.
biorxiv bioinformatics 100-200-users 2018Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and allow analysis of large datasets, bioRxiv, 2018-10-25
Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We developed opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Liebler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.
biorxiv bioinformatics 100-200-users 2018Inference of recombination maps from a single pair of genomes and its application to archaic samples, bioRxiv, 2018-10-25
ABSTRACTUnderstanding the causes and consequences of recombination rate evolution is a fundamental goal in genetics that requires recombination maps from across the tree of life. Since statistical inference of recombination maps typically depends on large samples, reaching out studies to non-model organisms requires alternative tools. Here we extend the sequentially Markovian coalescent model to jointly infer demography and the variation in recombination along a pair of genomes. Using extensive simulations and sequence data from humans, fruit-flies and a fungal pathogen, we demonstrate that iSMC accurately infers recombination maps under a wide range of scenarios – remarkably, even from a single pair of unphased genomes. We exploit this possibility and reconstruct the recombination maps of archaic hominids. We report that the evolution of the recombination landscape follows the established phylogeny of Neandertals, Denisovans and modern human populations, as expected if the genomic distribution of crossovers in hominids is largely neutral.
biorxiv evolutionary-biology 0-100-users 2018Independent domestication events in the blue-cheese fungus Penicillium roqueforti, bioRxiv, 2018-10-24
AbstractDomestication provides an excellent framework for studying adaptive divergence. Using population genomics and phenotypic assays, we reconstructed the domestication history of the blue cheese mold Penicillium roqueforti. We showed that this fungus was domesticated twice independently. The population used in Roquefort originated from an old domestication event associated with weak bottlenecks and exhibited traits beneficial for pre-industrial cheese production (slower growth in cheese and greater spore production on bread, the traditional multiplication medium). The other cheese population originated more recently from the selection of a single clonal lineage, was associated to all types of blue cheese worldwide but Roquefort, and displayed phenotypes more suited for industrial cheese production (high lipolytic activity, efficient cheese cavity colonization ability and salt tolerance). We detected genomic regions affected by recent positive selection and putative horizontal gene transfers. This study sheds light on the processes of rapid adaptation and raises questions about genetic resource conservation.
biorxiv evolutionary-biology 0-100-users 2018