Towards a gold standard for benchmarking gene set enrichment analysis, bioRxiv, 2019-06-19

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GOKEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availability<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpbioconductor.orgpackagesGSEABenchmarkeR>httpbioconductor.orgpackagesGSEABenchmarkeR<jatsext-link>

biorxiv bioinformatics 100-200-users 2019

Immune, Autonomic, and Endocrine Dysregulation in Autism and Ehlers-Danlos SyndromeHypermobility Spectrum Disorders Versus Unaffected Controls, bioRxiv, 2019-06-14

ABSTRACTBackgroundA growing body of literature suggests etiological overlap between Ehlers-Danlos syndrome (EDS)hypermobility spectrum disorders (HSD) and some cases of autism, although this relationship is poorly delineated. In addition, immune, autonomic, and endocrine dysregulation are reported in both conditions and may be relevant to their respective etiologies.AimsTo study symptom overlap in these two comorbid spectrum conditions.Methods and ProceduresWe surveyed 702 adults aged 25+ years on a variety of EDSHSD-related health topics, comparing individuals with EDSHSD, autism, and unaffected controls.Outcomes and ResultsThe autism group reported similar though less severe symptomology as the EDSHSD group, especially in areas of immuneautonomicendocrine dysregulation, connective tissue abnormalities (i.e., skin, bruisingbleeding), and chronic pain. EDSHSD mothers with autistic children reported more immune symptoms than EDSHSD mothers without, suggesting the maternal immune system could play a heritable role in these conditions (p = 0.0119).Conclusions and ImplicationsThese data suggest that EDSHSD and autism share aspects of immuneautonomicendocrine dysregulation, pain, and some tissue fragility, which is typically more severe in the former. This overlap, as well as documented comorbidity, suggests some forms of autism may be hereditary connective tissue disorders (HCTD).

biorxiv neuroscience 100-200-users 2019

Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains, bioRxiv, 2019-06-13

ABSTRACTCTCF binding contributes to the establishment of higher order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). We have carried out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We performed CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associated them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by an equilibrium of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor rapidly evolving clusters containing both evolutionary old and young CTCF sites as a result of repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. Overall, CTCF site clusters are an apparently important feature of CTCF binding evolution that are critical the functional stability of higher order chromatin structure.

biorxiv genomics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo