Alcohol consumption and mate choice in UK Biobank comparing observational and Mendelian randomization estimates, bioRxiv, 2018-09-16
AbstractAlcohol use is correlated within spouse-pairs, but it is difficult to disentangle the effects of alcohol consumption on mate-selection from social factors or cohabitation leading to spouses becoming more similar over time. We hypothesised that genetic variants related to alcohol consumption may, via their effect on alcohol behaviour, influence mate selection.Therefore, in a sample of over 47,000 spouse-pairs in the UK Biobank we utilised a well-characterised alcohol related variant, rs1229984 in ADH1B, as a genetic proxy for alcohol use. We compared the phenotypic concordance between spouses for self-reported alcohol use with the association between an individual’s self-reported alcohol use and their partner’s rs1229984 genotype using Mendelian randomization. This was followed up by an exploration of the spousal genotypic concordance for the variant and an analysis determining if relationship length may be related to spousal alcohol behaviour similarities.We found strong evidence that both an individual’s self-reported alcohol consumption and rs1229984 genotype are associated with their partner’s self-reported alcohol use. The Mendelian randomization analysis found that each unit increase in an individual’s weekly alcohol consumption increased their partner’s alcohol consumption by 0.26 units (95% C.I. 0.15, 0.38; P=1.10×10-5). Furthermore, the rs1229984 genotype was concordant within spouse-pairs, suggesting that some spousal concordance for alcohol consumption existed prior to cohabitation. Although the SNP is strongly associated with ancestry, our results suggest that this concordance is unlikely to be explained by population stratification. Overall, our findings suggest that alcohol behaviour directly influences mate selection.
biorxiv genetics 0-100-users 2018Co-option and Detoxification of a Phage Lysin for Housekeeping Function, bioRxiv, 2018-09-16
SummaryTemperate phages constitute a potentially beneficial genetic reservoir for bacterial innovation despite being selfish entities encoding an infection cycle inherently at odds with bacterial fitness. These phages integrate their genomes into the bacterial host during infection, donating new, but deleterious, genetic material the phage genome encodes toxic genes, such as lysins, that kill the bacterium during the phage infection cycle. Remarkably, some bacteria have exploited the destructive properties of phage genes for their own benefit by co-opting them as toxins for functions related to bacterial warfare, virulence, and secretion. However, do toxic phage genes ever become raw material for functional innovation? Here we report on a toxic phage gene whose product has lost its toxicity and has become a domain of a core cellular factor, SpmX, throughout the bacterial order Caulobacterales. Using a combination of phylogenetics, bioinformatics, structural biology, cell biology, and biochemistry, we have investigated the origin and function of SpmX and determined that its occurrence is the result of the detoxification of a phage peptidoglycan hydrolase gene. We show that the retained, attenuated activity of the phage-derived domain plays an important role in proper cell morphology and developmental regulation in representatives of this large bacterial clade. To our knowledge, this is the first observation of phage gene domestication in which a toxic phage gene has been co-opted for a housekeeping function.
biorxiv microbiology 0-100-users 2018Evaluating the evidence for biotypes of depression attempted replication of Drysdale et.al. 2017, bioRxiv, 2018-09-16
AbstractBackgroundPsychiatric disorders are highly heterogeneous, defined based on symptoms with little connection to potential underlying biological mechanisms. A possible approach to dissect biological heterogeneity is to look for biologically meaningful subtypes. A recent study Drysdale et al. (2017) showed promising results along this line by simultaneously using resting state fMRI and clinical data and identified four distinct subtypes of depression with different clinical profiles and abnormal resting state fMRI connectivity. These subtypes were predictive of treatment response to transcranial magnetic stimulation therapy.ObjectiveHere, we attempted to replicate the procedure followed in the Drysdale et al. study and their findings in an independent dataset of a clinically more heterogeneous sample of 187 participants with depression and anxiety. We aimed to answer the following questions 1) Using the same procedure, can we find a statistically significant and reliable relationship between brain connectivity and clinical symptoms? 2) Is the observed relationship similar to the one found in the original study? 3) Can we identify distinct and reliable subtypes? 4) Do they have similar clinical profiles as the subtypes identified in the original study?MethodsWe followed the original procedure as closely as possible, including a canonical correlation analysis to find a low dimensional representation of clinically relevant resting state fMRI features, followed by hierarchical clustering to identify subtypes. We extended the original procedure using additional statistical tests, to test the statistical significance of the relationship between resting state fMRI and clinical data, and the existence of distinct subtypes. Furthermore, we examined the stability of the whole procedure using resampling.Results and ConclusionWe were not able to replicate the findings of the original study. Relationships between brain connectivity and clinical symptoms were not statistically significant and we also did not find clearly distinct subtypes of depression. We argue, that based on our rigorous approach and in-depth review of the original results, that the evidence for the existence of the distinct resting state connectivity based subtypes of depression is weak and should be interpreted with caution.
biorxiv neuroscience 100-200-users 2018A comprehensive analysis of RNA sequences reveals macroscopic somatic clonal expansion across normal tissues, bioRxiv, 2018-09-14
Cancer genome studies have significantly advanced our knowledge of somatic mutations. However, how these mutations accumulate in normal cells and whether they promote pre-cancerous lesions remains poorly understood. Here we perform a comprehensive analysis of normal tissues by utilizing RNA sequencing data from ~6,700 samples across 29 normal tissues collected as part of the Genotype-Tissue Expression (GTEx) project. We identify somatic mutations using a newly developed pipeline, RNA-MuTect, for calling somatic mutations directly from RNA-seq samples and their matched-normal DNA. When applied to the GTEx dataset, we detect multiple variants across different tissues and find that mutation burden is associated with both the age of the individual and tissue proliferation rate. We also detect hotspot cancer mutations that share tissue specificity with their matched cancer type. This study is the first to analyze a large number of samples across multiple normal tissues, identifying clones with genomic aberrations observed in cancer.
biorxiv genomics 200-500-users 2018Dating genomic variants and shared ancestry in population-scale sequencing data, bioRxiv, 2018-09-14
AbstractThe origin and fate of new mutations within species is the fundamental process underlying evolution. However, while previous efforts have been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a non-parametric approach for estimating the date of origin of genetic variants that can be applied to large-scale genomic variation data sets. We demonstrate the accuracy and robustness of the approach through simulation and apply it to over 16 million single nucleotide poly-morphisms (SNPs) from two publicly available human genomic diversity resources. We characterize the differential relationship between variant frequency and age in different geographical regions and demonstrate the value of allele age in interpreting variants of known functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the genealogical history of a single genome or a group of individuals.
biorxiv genomics 100-200-users 2018Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks, bioRxiv, 2018-09-14
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here we sought to apply deep convolutional neural networks towards this goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, which we call Xpresso, more than doubles the accuracy of alternative sequence-based models, and isolates rules as predictive as models relying on ChIP-seq data. Xpresso recapitulates genome-wide patterns of transcriptional activity and predicts the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose the accurate prediction of cell type-specific gene expression based solely on primary sequence as a grand challenge for the field.
biorxiv genomics 200-500-users 2018