Organization and Regulation of Chromatin by Liquid-Liquid Phase Separation, bioRxiv, 2019-01-18

Genomic DNA is highly compacted in the nucleus of eukaryotic cells as a nucleoprotein assembly called chromatin. The basic unit of chromatin is the nucleosome, where ~146 base pair increments of the genome are wrapped and compacted around the core histone proteins. Further genomic organization and compaction occur through higher order assembly of nucleosomes. This organization regulates many nuclear processes, and is controlled in part by histone post-transtranslational modifications and chromatin-binding proteins. Mechanisms that regulate the assembly and compaction of the genome remain unclear. Here we show that in the presence of physiologic concentrations of mono- and divalent salts, histone tail-driven interactions drive liquid-liquid phase separation (LLPS) of nucleosome arrays, resulting in substantial condensation. Phase separation of nucleosomal arrays is inhibited by histone acetylation, whereas histone H1 promotes phase separation, further compaction, and decreased dynamics within droplets, mirroring the relationship between these modulators and the accessibility of the genome in cells. These results indicate that under physiologically relevant conditions, LLPS is an intrinsic behavior of the chromatin polymer, and suggest a model in which the condensed phase reflects a genomic 'ground state' that can produce chromatin organization and compaction in vivo. The dynamic nature of this state could enable known modulators of chromatin structure, such as post-translational modifications and chromatin binding proteins, to act upon it and consequently control nuclear processes such as transcription and DNA repair. Our data suggest an important role for LLPS of chromatin in the organization of the eukaryotic genome.

biorxiv biophysics 100-200-users 2019

BEHST genomic set enrichment analysis enhanced through integration of chromatin long-range interactions, bioRxiv, 2019-01-16

Transforming data from genome-scale assays into knowledge of affected molecular functions and pathways is a key challenge in biomedical research. Using vocabularies of functional terms and databases annotating genes with these terms, pathway enrichment methods can identify terms enriched in a gene list. With data that can refer to intergenic regions, however, one must first connect the regions to the terms, which are usually annotated only to genes. To make these connections, existing pathway enrichment approaches apply unwarranted assumptions such as annotating non-coding regions with the terms from adjacent genes. We developed a computational method that instead links genomic regions to annotations using data on long-range chromatin interactions. Our method, Biological Enrichment of Hidden Sequence Targets (BEHST), finds Gene Ontology (GO) terms enriched in genomic regions more precisely and accurately than existing methods. We demonstrate BEHST's ability to retrieve more pertinent and less ambiguous GO terms associated with results of in vivo mouse enhancer screens or enhancer RNA assays for multiple tissue types. BEHST will accelerate the discovery of affected pathways mediated through long-range interactions that explain non-coding hits in genome-wide association study (GWAS) or genome editing screens. BEHST is free software with a command-line interface for Linux or macOS and a web interface (httpbehst.hoffmanlab.org).

biorxiv bioinformatics 100-200-users 2019

Killer whale genomes reveal a complex history of recurrent admixture and vicariance Supplementary Materials, bioRxiv, 2019-01-16

Reconstruction of the demographic and evolutionary history of populations assuming a consensus tree-like relationship can mask more complex scenarios, which are prevalent in nature. An emerging genomic toolset, which has been most comprehensively harnessed in the reconstruction of human evolutionary history, enables molecular ecologists to elucidate complex population histories. Killer whales have limited extrinsic barriers to dispersal and have radiated globally, and are therefore a good candidate model for the application of such tools. Here, we analyse a global dataset of killer whale genomes in a rare attempt to elucidate global population structure in a non-human species. We identify a pattern of genetic homogenisation at lower latitudes and the greatest differentiation at high latitudes, even between currently sympatric lineages. The processes underlying the major axis of structure include high drift at the edge of species' range, likely associated with founder effects and allelic surfing during post-glacial range expansion. Divergence between Antarctic and non-Antarctic lineages is further driven by ancestry segments with up to four-fold older coalescence time than the genome-wide average; relicts of a previous vicariance during an earlier glacial cycle. Our study further underpins that episodic gene flow is ubiquitous in natural populations, and can occur across great distances and after substantial periods of isolation between populations. Thus, understanding the evolutionary history of a species requires comprehensive geographic sampling and genome-wide data to sample the variation in ancestry within individuals.

biorxiv evolutionary-biology 100-200-users 2019

Probabilistic cell type assignment of single-cell transcriptomic data reveals spatiotemporal microenvironment dynamics in human cancers Supplementary tables, bioRxiv, 2019-01-16

Single-cell RNA sequencing (scRNA-seq) has transformed biomedical research, enabling decomposition of complex tissues into disaggregated, functionally distinct cell types. For many applications, investigators wish to identify cell types with known marker genes. Typically, such cell type assignments are performed through unsupervised clustering followed by manual annotation based on these marker genes, or via mapping procedures to existing data. However, the manual interpretation required in the former case scales poorly to large datasets, which are also often prone to batch effects, while existing data for purified cell types must be available for the latter. Furthermore, unsupervised clustering can be error-prone, leading to under- and over- clustering of the cell types of interest. To overcome these issues we present CellAssign, a probabilistic model that leverages prior knowledge of cell type marker genes to annotate scRNA-seq data into pre-defined and de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while simultaneously controlling for batch and patient effects. We demonstrate the analytical advantages of CellAssign through extensive simulations and exemplify real-world utility to profile the spatial dynamics of high-grade serous ovarian cancer and the temporal dynamics of follicular lymphoma. Our analysis reveals subclonal malignant phenotypes and points towards an evolutionary interplay between immune and cancer cell populations with cancer cells escaping immune recognition.

biorxiv bioinformatics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo