Not just onep Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities, bioRxiv, 2019-04-10

AbstractA single dimension of general psychopathology,p, has been hypothesized to represent a general liability that spans multiple types of psychiatric disorders and non-clinical variation in psychiatric symptoms across the lifespan. We conducted genome-wide association analyses of lifetime symptoms of mania, psychosis, irritability in 124,952 to 208,315 individuals from UK Biobank, and then applied Genomic SEM to model the genetic relationships between these psychiatric symptoms and clinically-defined psychiatric disorders (schizophrenia, bipolar disorder, major depressive disorder). Two dimensions of cross-cutting genetic liability emerged general vulnerability to self-reported symptoms (pself) versus transdiagnostic vulnerability to clinically-diagnosed disease (pclinician). These were only modestly correlated (rg= .344). Multivariate GWAS identified 145 and 11 independent and genome-wide significant loci forpclinicianandpself, respectively, and improved polygenic prediction, relative to univariate GWAS, in hold-out samples. Despite the severe impairments in occupational and educational functioning seen in patients with schizophrenia and bipolar disorder,pselfshowed stronger and more pervasive genetic correlations with facets of socioeconomic disadvantage (educational attainment, income, and neighborhood deprivation), whereaspclinicianwas more strongly associated with medical disorders unrelated to the brain. Genetic variance inpclinicianthat was unrelated to general vulnerability to psychiatric symptoms was associated withlesssocioeconomic disadvantage, suggesting positive selection biases in clinical samples used in psychiatric GWAS. These findings inform criticisms of psychiatric nosology by suggesting that cross-disorder genetic liabilities identified in GWASs of clinician-defined psychiatric disease are relatively distinct from genetic liabilities operating on self-reported symptom variation in the general population.

biorxiv genetics 100-200-users 2019

Basal Contamination of Bulk Sequencing Lessons from the GTEx dataset, bioRxiv, 2019-04-09

AbstractBackgroundOne of the challenges of next generation sequencing (NGS) is contaminating reads from other samples. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, as a useful resource to understand the factors that contribute to contamination.ResultsWe obtained 11,340 RNA-Seq samples, DNA variant call files (VCF) of 635 individuals, and technical metadata from GTEx as well as read count data from the Human Protein Atlas (HPA) and a pharmacogenetics study. We analyzed 48 tissues in GTEx. Of these, 24 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and CELA3A). Fifteen additional highly expressed genes from other tissues were also indicative of contamination (KRT4, KRT13, PGC, CPA1, GP2, PRL, LIPF, CTRB2, FGA, HP, CKM, FGG, MYBPC1, MYH2, ZG16B). Sample contamination by non-native genes was highly associated with a sample being sequenced on the same day as a tissue that natively has high levels of those genes. This was highly significant for both pancreas genes (p= 2.7E-75) and esophagus genes (p= 8.9E-154). We used genetic polymorphism differences between individuals as validation of the contamination. Specifically, 11 SNPs in five genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes. Low-level contamination affected 1,841 (15.8%) samples (defined as ≥500 PRSS1 read counts). It also led to eQTL assignments in inappropriate tissues among these 19 genes. In support of this type of contamination occurring widely, pancreas gene contamination (PRSS1) was also observed in the HPA dataset, where pancreas samples were sequenced, but not in the pharmacogenomics dataset, where they were not.ConclusionsHighly expressed, tissue-enriched genes basally contaminate the GTEx dataset impacting on some downstream GTEx data analyses. This type of contamination is not unique to GTEx, being shared with other datasets. Awareness of this process will reduce assigning variable, contaminating low-level gene expression to disease processes.

biorxiv genomics 100-200-users 2019

Basal Contamination of Sequencing Lessons from the GTEx dataset, bioRxiv, 2019-04-09

AbstractOne of the challenges of next generation sequencing (NGS) is read contamination. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, to understand the factors that contribute to contamination. We obtained GTEx datasets and technical metadata and validating RNA-Seq from other studies. Of 48 analyzed tissues in GTEx, 26 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, andor CELA3A). Fourteen additional highly expressed genes from other tissues also indicated contamination. Sample contamination by non-native genes was associated with a sample being sequenced on the same day as a tissue that natively expressed those genes. This was highly significant for pancreas and esophagus genes (linear model, p=9.5e-237 and p=5e-260 respectively). Nine SNPs in four genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes, validating the contamination. Low-level contamination affected 4,497 (39.6%) samples (defined as 10 PRSS1 TPM). It also led ≥ to eQTL assignments in inappropriate tissues among these 18 genes. We note this type of contamination occurs widely, impacting bulk and single cell data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses. Awareness of this process is necessary to avoid assigning inaccurate importance to low-level gene expression in inappropriate tissues and cells.

biorxiv genomics 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo