An improved pig reference genome sequence to enable pig genetics and genomics research, bioRxiv, 2019-06-13
AbstractThe domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) represented a purebred female pig from a commercial pork production breed (Duroc), and was established using older clone-based sequencing methods. The Sscrofa10.2 assembly was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and one for an outbred, composite breed male animal commonly used for commercial pork production (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy compared to the earlier reference, and the availability of two independent assemblies provided an opportunity to identify large-scale variants and to error-check the accuracy of representation of the genome. We propose that the improved Duroc breed assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.
biorxiv genomics 100-200-users 2019Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains, bioRxiv, 2019-06-13
ABSTRACTCTCF binding contributes to the establishment of higher order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). We have carried out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We performed CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associated them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by an equilibrium of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor rapidly evolving clusters containing both evolutionary old and young CTCF sites as a result of repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. Overall, CTCF site clusters are an apparently important feature of CTCF binding evolution that are critical the functional stability of higher order chromatin structure.
biorxiv genomics 100-200-users 2019Characterizing the temporal dynamics of gene expression in single cells with sci-fate, bioRxiv, 2019-06-11
AbstractGene expression programs are dynamic, e.g. the cell cycle, response to stimuli, normal differentiation and development, etc. However, nearly all techniques for profiling gene expression in single cells fail to directly capture the dynamics of transcriptional programs, which limits the scope of biology that can be effectively investigated. Towards addressing this, we developed sci-fate, a new technique that combines S4U labeling of newly synthesized mRNA with single cell combinatorial indexing (sci-), in order to concurrently profile the whole and newly synthesized transcriptome in each of many single cells. As a proof-of-concept, we applied sci-fate to a model system of cortisol response and characterized expression dynamics in over 6,000 single cells. From these data, we quantify the dynamics of the cell cycle and glucocorticoid receptor activation, while also exploring their intersection. We furthermore use these data to develop a framework for inferring the distribution of cell state transitions. We anticipate sci-fate will be broadly applicable to quantitatively characterize transcriptional dynamics in diverse systems.
biorxiv genomics 100-200-users 2019A robust benchmark for germline structural variant detection, bioRxiv, 2019-06-10
AbstractNew technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls ≥50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
biorxiv genomics 100-200-users 2019Multiparametric phenotyping of compound effects on patient derived organoids, bioRxiv, 2019-06-07
AbstractPatient derived organoids (PDOs) closely resemble individual tumor biology and allow testing of small molecules ex vivo. To systematically dissect compound effects on 3D organoids, we developed a high-throughput imaging and quantitative analysis approach. We generated PDOs from colorectal cancer patients, treated them with >500 small molecules and captured >3 million images by confocal microscopy. We developed the software framework SCOPE to measure compound induced re-organization of PDOs. We found diverse, but re-occurring phenotypes that clustered by compound mode-of-action. Complex phenotypes were not congruent with PDO viability and many were specific to subsets of PDO lines or were influenced by recurrent mutations. We further analyzed specific phenotypes induced by compound classes and found GSK3 inhibitors to disassemble PDOs via focal adhesion signaling or that MEK inhibition led to bloating of PDOs by enhancing of stemness. Finally, by viability classification, we show heterogeneous susceptibilities of PDOs to clinical anticancer drugs.
biorxiv genomics 0-100-users 2019The murine transcriptome reveals global aging nodes with organ-specific phase and amplitude, bioRxiv, 2019-06-07
Aging is the single greatest cause of disease and death worldwide, and so understanding the associated processes could vastly improve quality of life. While the field has identified major categories of aging damage such as altered intercellular communication, loss of proteostasis, and eroded mitochondrial function1, these deleterious processes interact with extraordinary complexity within and between organs. Yet, a comprehensive analysis of aging dynamics organism-wide is lacking. Here we performed RNA-sequencing of 17 organs and plasma proteomics at 10 ages across the mouse lifespan. We uncover previously unknown linear and non-linear expression shifts during aging, which cluster in strikingly consistent trajectory groups with coherent biological functions, including extracellular matrix regulation, unfolded protein binding, mitochondrial function, and inflammatory and immune response. Remarkably, these gene sets are expressed similarly across tissues, differing merely in age of onset and amplitude. Especially pronounced is widespread immune cell activation, detectable first in white adipose depots in middle age. Single-cell RNA-sequencing confirms the accumulation of adipose T and B cells, including immunoglobulin J-expressing plasma cells, which also accrue concurrently across diverse organs. Finally, we show how expression shifts in distinct tissues are highly correlated with corresponding protein levels in plasma, thus potentially contributing to aging of the systemic circulation. Together, these data demonstrate a similar yet asynchronous inter- and intra-organ progression of aging, thereby providing a foundation to track systemic sources of declining health at old age.
biorxiv genomics 100-200-users 2019