Minimum epistasis interpolation for sequence-function relationships, bioRxiv, 2019-06-02
AbstractMassively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While these assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes are not directly assayed. Here we present a method based on the idea of inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction in which mutational effects change as little as possible across adjacent genetic backgrounds. Although this method is highly conservative and has no tunable parameters, it also makes no assumptions about the form that genetic interactions take, resulting in predictions that can behave in a very complicated manner where the data require it but which are nearly additive where data is sparse or absent. We apply this method to analyze a fitness landscape for protein G, showing that our technique can provide a substantially less epistatic fit to the landscape than standard methods with little loss in predictive power. Moreover, our analysis reveals that the complex structure of epistasis observed in this dataset can be well-understood in terms of a simple qualitative model consisting of three fitness peaks where the landscape is locally additive in the vicinity of each peak.
biorxiv bioinformatics 0-100-users 2019Evidence for rapid phenotypic and behavioral change in a recently established cavefish population, bioRxiv, 2019-05-28
AbstractSubstantial morphological and behavioral shifts often accompany rapid environmental change, yet, little is known about the early stages of cave colonization. Relative to surface streams, caves are extreme environments with perpetual darkness and low nutrient availability. The Mexican tetra (Astyanax mexicanus), has repeatedly colonized caves throughout Mexico, suggesting an ability to adapt to these conditions. Here, we survey for phenotypic and behavioral differences between a surface population and a cave population of A. mexicanus that has recently colonized Honey Creek Cave, Comal County, Texas, likely within the last century. We found that fish from Honey Creek Cave and fish from Honey Creek surface populations differ significantly in morphological traits including length, coloration, body condition, eye size, and dorsal fin placement. Cavefish also exhibit an increased number of superficial neuromasts relative to surface fish. Behaviorally, cavefish consume fewer worms when trials are performed in both lighted and darkened conditions. Cavefish are more aggressive than surface fish and exhibit fewer behaviors associated with stress. Further in contrast to surface fish, cavefish prefer the edges to the center of an arena and are qualitatively more likely to investigate a novel object placed in the tank. While cavefish and surface fish were wild-caught and developmental environment likely play a role in shaping these differences, our work demonstrates morphological and behavioral shifts for Texas cavefish and offers an exciting opportunity for future work to explore the genetic and environmental contributions to early cave colonization.
biorxiv animal-behavior-and-cognition 0-100-users 2019Fate mapping via Ms4a3 expression history traces monocyte-derived cells, bioRxiv, 2019-05-28
SUMMARYMost tissue-resident macrophage (RTM) populations are seeded by waves of embryonic hematopoiesis and are self-maintained independently of a bone-marrow contribution during adulthood. A proportion of RTMs, however, is constantly replaced by blood monocytes and their functions compared to embryonic RTM remains unclear. The kinetics and extent of the contribution of circulating monocytes to RTM replacement during homeostasis, inflammation and disease is highly debated. Here, we identified Ms4a3 as a specific marker expressed by granulocyte-monocyte progenitors (GMPs) and subsequently generated Ms4a3TdT reporter and Ms4a3Cre-RosaTdT fate mapper models to follow monocytes and their progenies. Our Ms4a3Cre-RosaTdT model traced efficiently blood monocytes (97%) and granulocytes (100%), but no lymphocytes or tissue dendritic cells. Using this model, we precisely quantified the contribution of monocytes to the RTM pool during homeostasis and inflammation. The unambiguous identification of monocyte-derived cells will permit future studies of their function under any condition.
biorxiv immunology 0-100-users 2019EpiScanpy integrated single-cell epigenomic analysis, bioRxiv, 2019-05-25
ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.
biorxiv bioinformatics 0-100-users 2019The genetic makeup of the electrocardiogram, bioRxiv, 2019-05-25
AbstractSince its original description in 1893 by Willem van Einthoven, the electrocardiogram (ECG) has been instrumental in the recognition of a wide array of cardiac disorders1,2. Although many electrocardiographic patterns have been well described, the underlying biology is incompletely understood. Genetic associations of particular features of the ECG have been identified by genome wide studies. This snapshot approach only provides fragmented information of the underlying genetic makeup of the ECG. Here, we follow the effecs of individual genetic variants through the complete cardiac cycle the ECG represents. We found that genetic variants have unique morphological signatures not identfied by previous analyses. By exploiting identified abberations of these morphological signatures, we show that novel genetic loci can be identified for cardiac disorders. Our results demonstrate how an integrated approach to analyse high-dimensional data can further our understanding of the ECG, adding to the earlier undertaken snapshot analyses of individual ECG components. We anticipate that our comprehensive resource will fuel in silico explorations of the biological mechanisms underlying cardiac traits and disorders represented on the ECG. For example, known disease causing variants can be used to identify novel morphological ECG signatures, which in turn can be utilized to prioritize genetic variants or genes for functional validation. Furthermore, the ECG plays a major role in the development of drugs, a genetic assessment of the entire ECG can drive such developments.
biorxiv genetics 0-100-users 2019Novel Rhabdovirus and an almost complete drain fly transcriptome recovered from two independent contaminations of clinical samples, bioRxiv, 2019-05-24
AbstractMetagenomic approaches enable an open exploration of microbial communities without requiring a priori knowledge of a sample’s composition by shotgun sequencing the total RNA or DNA of the sample. Such an approach is valuable for exploratory diagnostics of novel pathogens in clinical practice. Yet, one may also identify surprising off-target findings. Here we report a mostly complete transcriptome from a drain fly (likely Psychoda alternata) as well as a novel Rhabdovirus-like virus recovered from two independent contaminations of RNA sequencing libraries from clinical samples of cerebral spinal fluid (CSF) and serum, out of a total of 724 libraries sequenced at the same laboratory during a 2-year time span. This drain fly genome shows a considerable divergence from previously sequenced insects, which may obscure common clinical metagenomic analyses not expecting such contaminations. The classification of these contaminant sequences allowed us to identify infected drain flies as the likely origin of the novel Rhabdovirus-like sequence, which could have been erroneously linked to human pathology, had they been ignored.
biorxiv bioinformatics 0-100-users 2019