Towards a gold standard for benchmarking gene set enrichment analysis, bioRxiv, 2019-06-19
AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GOKEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availability<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpbioconductor.orgpackagesGSEABenchmarkeR>httpbioconductor.orgpackagesGSEABenchmarkeR<jatsext-link>
biorxiv bioinformatics 100-200-users 2019Comprehensive ecosystem-specific 16S rRNA gene databases with automated taxonomy assignment (AutoTax) provide species-level resolution in microbial ecology, bioRxiv, 2019-06-18
AbstractHigh-throughput 16S rRNA gene amplicon sequencing is an indispensable method for studying the diversity and dynamics of microbial communities. However, this method is presently hampered by the lack of high-identity reference sequences for many environmental microbes in the public 16S rRNA gene reference databases, and by the lack of a systematic and comprehensive taxonomic classification for most environmental bacteria. Here we combine high-quality and high-throughput full-length 16S rRNA gene sequencing with a novel sequence identity-based approach for automated taxonomy assignment (AutoTax) to create robust, near-complete 16S rRNA gene databases for complex environmental ecosystems. To demonstrate the benefit of the approach, we created an ecosystem-specific database for wastewater treatment systems and anaerobic digesters. The novel approach allows consistent species-level classification of 16S rRNA amplicons sequence variants (ASVs) and the design of highly specific oligonucleotide probes for fluorescence in situ hybridization, which can reveal in situ properties of microbes at unprecedented taxonomic resolution.
biorxiv microbiology 100-200-users 2019Immune, Autonomic, and Endocrine Dysregulation in Autism and Ehlers-Danlos SyndromeHypermobility Spectrum Disorders Versus Unaffected Controls, bioRxiv, 2019-06-14
ABSTRACTBackgroundA growing body of literature suggests etiological overlap between Ehlers-Danlos syndrome (EDS)hypermobility spectrum disorders (HSD) and some cases of autism, although this relationship is poorly delineated. In addition, immune, autonomic, and endocrine dysregulation are reported in both conditions and may be relevant to their respective etiologies.AimsTo study symptom overlap in these two comorbid spectrum conditions.Methods and ProceduresWe surveyed 702 adults aged 25+ years on a variety of EDSHSD-related health topics, comparing individuals with EDSHSD, autism, and unaffected controls.Outcomes and ResultsThe autism group reported similar though less severe symptomology as the EDSHSD group, especially in areas of immuneautonomicendocrine dysregulation, connective tissue abnormalities (i.e., skin, bruisingbleeding), and chronic pain. EDSHSD mothers with autistic children reported more immune symptoms than EDSHSD mothers without, suggesting the maternal immune system could play a heritable role in these conditions (p = 0.0119).Conclusions and ImplicationsThese data suggest that EDSHSD and autism share aspects of immuneautonomicendocrine dysregulation, pain, and some tissue fragility, which is typically more severe in the former. This overlap, as well as documented comorbidity, suggests some forms of autism may be hereditary connective tissue disorders (HCTD).
biorxiv neuroscience 100-200-users 2019An improved pig reference genome sequence to enable pig genetics and genomics research, bioRxiv, 2019-06-13
AbstractThe domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) represented a purebred female pig from a commercial pork production breed (Duroc), and was established using older clone-based sequencing methods. The Sscrofa10.2 assembly was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and one for an outbred, composite breed male animal commonly used for commercial pork production (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy compared to the earlier reference, and the availability of two independent assemblies provided an opportunity to identify large-scale variants and to error-check the accuracy of representation of the genome. We propose that the improved Duroc breed assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.
biorxiv genomics 100-200-users 2019Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains, bioRxiv, 2019-06-13
ABSTRACTCTCF binding contributes to the establishment of higher order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). We have carried out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We performed CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associated them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by an equilibrium of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor rapidly evolving clusters containing both evolutionary old and young CTCF sites as a result of repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. Overall, CTCF site clusters are an apparently important feature of CTCF binding evolution that are critical the functional stability of higher order chromatin structure.
biorxiv genomics 100-200-users 2019Brain-wide mapping of contextual fear memory engram ensembles supports the dispersed engram complex hypothesis, bioRxiv, 2019-06-12
Neuronal ensembles that hold specific memory (memory engrams) have been identified in the hippocampus, amygdala, and cortex. It has been hypothesized that engrams for a specific memory are distributed among multiple brain regions that are functionally connected. Here, we report the hitherto most extensive engram map for contextual fear memory by characterizing activity-tagged neurons in 409 regions using SHIELD-based tissue phenotyping. The mapping was aided by a novel engram index, which identified cFos+ brain regions holding engrams with a high probability. Optogenetic manipulations confirmed previously known engrams and revealed new engrams. Many of these engram holding-regions were functionally connected to the CA1 or amygdala engrams. Simultaneous chemogenetic reactivation of multiple engrams, which mimics natural memory recall, conferred a greater level of memory recall than reactivation of a single engram ensemble. Overall, our study supports the hypothesis that a memory is stored in functionally connected engrams distributed across multiple brain regions.
biorxiv neuroscience 100-200-users 2019