chromoMap An R package for Interactive Visualization and Annotation of Chromosomes, bioRxiv, 2019-04-11
AbstractSummarychromoMap is an R package for constructing interactive visualizations of chromosomeschromosomal regions, and mapping of chromosomal elements (like genes) onto them, of any living organism. The package takes separate tab-delimited files (BED like) to specify the genomic co-ordinates of the chromosomes and the elements to annotate. Each rendered chromosome is composed of continuous loci of specific ranges where each locus, on hover, displays detailed information about the elements annotated within that locus range. By just tweaking parameters of a single function, users can generate a variety of plots that can either be saved as static image or shared as HTML documents. Users can utilize the various prominent features of chromoMap including, but not limited to, visualizing polyploidy, creating chromosome heatmaps, mapping groups of elements, adding hyperlinks to elements, multi-species chromosome visualization.Availability and implementationThe R package chromoMap is available under the GPL-3 Open Source license. It is included with a vignette for comprehensive understanding of its various features, and is freely available from <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsCRAN.R-project.orgpackage=chromoMap>httpsCRAN.R-project.orgpackage=chromoMap<jatsext-link>.Contactlakshayanand15@gmail.com<jatssec sec-type=supplementary-material>Supplementary informationSupplementary data are available online.
biorxiv bioinformatics 100-200-users 2019Comparing within- and between-family polygenic score prediction, bioRxiv, 2019-04-11
AbstractPolygenic scores are a popular tool for prediction of complex traits. However, prediction estimates in samples of unrelated participants can include effects of population stratification, assortative mating and environmentally mediated parental genetic effects, a form of genotype-environment correlation (rGE). Comparing genome-wide polygenic score (GPS) predictions in unrelated individuals with predictions between siblings in a within-family design is a powerful approach to identify these different sources of prediction. Here, we compared within- to between-family GPS predictions of eight life outcomes (anthropometric, cognitive, personality and health) for eight corresponding GPSs. The outcomes were assessed in up to 2,366 dizygotic (DZ) twin pairs from the Twins Early Development Study from age 12 to age 21. To account for family clustering, we used mixed-effects modelling, simultaneously estimating within- and between-family effects for target- and cross-trait GPS prediction of the outcomes. There were three main findings (1) DZ twin GPS differences predicted DZ differences in height, BMI, intelligence, educational achievement and ADHD symptoms; (2) target and cross-trait analyses indicated that GPS prediction estimates for cognitive traits (intelligence and educational achievement) were on average 60% greater between families than within families, but this was not the case for non-cognitive traits; and (3) this within- and between-family difference for cognitive traits disappeared after controlling for family socio-economic status (SES), suggesting that SES is a source of between-family prediction through rGE mechanisms. These results provide novel insights into the patterns by which rGE contributes to GPS prediction, while ruling out confounding due to population stratification and assortative mating.
biorxiv genomics 100-200-users 2019In the Body’s Eye The Computational Anatomy of Interoceptive Inference, bioRxiv, 2019-04-10
AbstractA growing body of evidence highlights the intricate linkage of exteroceptive perception to the rhythmic activity of the visceral body. In parallel, interoceptive inference theories of emotion and self-consciousness are on the rise in cognitive science. However, thus far no formal theory has emerged to integrate these twin domains; instead most extant work is conceptual in nature. Here, we introduce a formal model of cardiac active inference, which explains how ascending cardiac signals entrain exteroceptive sensory perception and confidence. Through simulated psychophysics, we reproduce the defensive startle reflex and commonly reported effects linking the cardiac cycle to fear perception. We further show that simulated ‘interoceptive lesions’ blunt fear expectations, induce psychosomatic hallucinations, and exacerbate metacognitive biases. Through synthetic heart-rate variability analyses, we illustrate how the balance of arousal-priors and visceral prediction errors produces idiosyncratic patterns of physiological reactivity. Our model thus offers the possibility to computationally phenotype disordered brain-body interaction.
biorxiv neuroscience 200-500-users 2019Measuring and Mitigating PCR Bias in Microbiome Data, bioRxiv, 2019-04-10
AbstractPCR amplification plays a central role in the measurement of mixed microbial communities via high-throughput sequencing. Yet PCR is also known to be a common source of bias in microbiome data. Here we present a paired modeling and experimental approach to characterize and mitigate PCR bias in microbiome studies. We use experimental data from mock bacterial communities to validate our approach and human gut microbiota samples to characterize PCR bias under real-world conditions. Our results suggest that PCR can bias estimates of microbial relative abundances by a factor of 2-4 but that this bias can be mitigated using simple Bayesian multinomial logistic-normal linear models.Author summaryHigh-throughput sequencing is often used to profile host-associated microbial communities. Many processing steps are required to transform a community of bacteria into a pool of DNA suitable for sequencing. One important step is amplification where, to create enough DNA for sequencing, DNA from many different bacteria are repeatedly copied using a technique called Polymerase Chain Reaction (PCR). However, PCR is known to introduce bias as DNA from some bacteria are more efficiently copied than others. Here we introduce an experimental procedure that allows this bias to be measured and computational techniques that allow this bias to be mitigated in sequencing data.
biorxiv genomics 0-100-users 2019Not just onep Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities, bioRxiv, 2019-04-10
AbstractA single dimension of general psychopathology,p, has been hypothesized to represent a general liability that spans multiple types of psychiatric disorders and non-clinical variation in psychiatric symptoms across the lifespan. We conducted genome-wide association analyses of lifetime symptoms of mania, psychosis, irritability in 124,952 to 208,315 individuals from UK Biobank, and then applied Genomic SEM to model the genetic relationships between these psychiatric symptoms and clinically-defined psychiatric disorders (schizophrenia, bipolar disorder, major depressive disorder). Two dimensions of cross-cutting genetic liability emerged general vulnerability to self-reported symptoms (pself) versus transdiagnostic vulnerability to clinically-diagnosed disease (pclinician). These were only modestly correlated (rg= .344). Multivariate GWAS identified 145 and 11 independent and genome-wide significant loci forpclinicianandpself, respectively, and improved polygenic prediction, relative to univariate GWAS, in hold-out samples. Despite the severe impairments in occupational and educational functioning seen in patients with schizophrenia and bipolar disorder,pselfshowed stronger and more pervasive genetic correlations with facets of socioeconomic disadvantage (educational attainment, income, and neighborhood deprivation), whereaspclinicianwas more strongly associated with medical disorders unrelated to the brain. Genetic variance inpclinicianthat was unrelated to general vulnerability to psychiatric symptoms was associated withlesssocioeconomic disadvantage, suggesting positive selection biases in clinical samples used in psychiatric GWAS. These findings inform criticisms of psychiatric nosology by suggesting that cross-disorder genetic liabilities identified in GWASs of clinician-defined psychiatric disease are relatively distinct from genetic liabilities operating on self-reported symptom variation in the general population.
biorxiv genetics 100-200-users 2019Basal Contamination of Bulk Sequencing Lessons from the GTEx dataset, bioRxiv, 2019-04-09
AbstractBackgroundOne of the challenges of next generation sequencing (NGS) is contaminating reads from other samples. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, as a useful resource to understand the factors that contribute to contamination.ResultsWe obtained 11,340 RNA-Seq samples, DNA variant call files (VCF) of 635 individuals, and technical metadata from GTEx as well as read count data from the Human Protein Atlas (HPA) and a pharmacogenetics study. We analyzed 48 tissues in GTEx. Of these, 24 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and CELA3A). Fifteen additional highly expressed genes from other tissues were also indicative of contamination (KRT4, KRT13, PGC, CPA1, GP2, PRL, LIPF, CTRB2, FGA, HP, CKM, FGG, MYBPC1, MYH2, ZG16B). Sample contamination by non-native genes was highly associated with a sample being sequenced on the same day as a tissue that natively has high levels of those genes. This was highly significant for both pancreas genes (p= 2.7E-75) and esophagus genes (p= 8.9E-154). We used genetic polymorphism differences between individuals as validation of the contamination. Specifically, 11 SNPs in five genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes. Low-level contamination affected 1,841 (15.8%) samples (defined as ≥500 PRSS1 read counts). It also led to eQTL assignments in inappropriate tissues among these 19 genes. In support of this type of contamination occurring widely, pancreas gene contamination (PRSS1) was also observed in the HPA dataset, where pancreas samples were sequenced, but not in the pharmacogenomics dataset, where they were not.ConclusionsHighly expressed, tissue-enriched genes basally contaminate the GTEx dataset impacting on some downstream GTEx data analyses. This type of contamination is not unique to GTEx, being shared with other datasets. Awareness of this process will reduce assigning variable, contaminating low-level gene expression to disease processes.
biorxiv genomics 100-200-users 2019