Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model, bioRxiv, 2019-03-12
AbstractSingle cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization pro-cedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We pro-pose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.
biorxiv genomics 200-500-users 2019Genetic analysis identifies molecular systems and biological pathways associated with household income, bioRxiv, 2019-03-12
AbstractSocio-economic position (SEP) is a multi-dimensional construct reflecting (and influencing) multiple socio-cultural, physical, and environmental factors. Previous genome-wide association studies (GWAS) using household income as a marker of SEP have shown that common genetic variants account for 11% of its variation. Here, in a sample of 286,301 participants from UK Biobank, we identified 30 independent genome-wide significant loci, 29 novel, that are associated with household income. Using a recently-developed method to meta-analyze data that leverages power from genetically-correlated traits, we identified an additional 120 income-associated loci. These loci showed clear evidence of functional enrichment, with transcriptional differences identified across multiple cortical tissues, in addition to links with GABAergic and serotonergic neurotransmission. We identified neurogenesis and the components of the synapse as candidate biological systems that are linked with income. By combining our GWAS on income with data from eQTL studies and chromatin interactions, 24 genes were prioritized for follow up, 18 of which were previously associated with cognitive ability. Using Mendelian Randomization, we identified cognitive ability as one of the causal, partly-heritable phenotypes that bridges the gap between molecular genetic inheritance and phenotypic consequence in terms of income differences. Significant differences between genetic correlations indicated that, the genetic variants associated with income are related to better mental health than those linked to educational attainment (another commonly-used marker of SEP). Finally, we were able to predict 2.5% of income differences using genetic data alone in an independent sample. These results are important for understanding the observed socioeconomic inequalities in Great Britain today.
biorxiv genetics 200-500-users 2019The Life History Of Human Foraging Cross-Cultural And Individual Variation, bioRxiv, 2019-03-12
ABSTRACTHuman adaptation depends upon the integration of slow life history, complex production skills, and extensive sociality. Refining and testing models of the evolution of human life history and cultural learning will benefit from increasingly accurate measurement of knowledge, skills, and rates of production with age. We pursue this goal by inferring individual hunters’ of hunting skill gain and loss from approximately 23,000 hunting records generated by more than 1,800 individuals at 40 locations. The model provides an improved picture of ages of peak productivity as well as variation within and among ages. The data reveal an average age of peak productivity between 30 and 35 years of age, though high skill is maintained throughout much of adulthood. In addition, there is substantial variation both among individuals and sites. Within study sites, variation among individuals depends more upon heterogeneity in rates of decline than in rates of increase. This analysis sharpens questions about the co-evolution of human life history and cultural adaptation. It also demonstrates new statistical algorithms and models that expand the potential inferences drawn from detailed quantitative data collected in the field.
biorxiv animal-behavior-and-cognition 100-200-users 2019Clades of huge phage from across Earth’s ecosystems, bioRxiv, 2019-03-11
Phage typically have small genomes and depend on their bacterial hosts for replication. DNA sequenced from many diverse ecosystems revealed hundreds of huge phage genomes, between 200 kbp and 716 kbp in length. Thirty-four genomes were manually curated to completion, including the largest phage genomes yet reported. Expanded genetic repertoires include diverse and new CRISPR-Cas systems, tRNAs, tRNA synthetases, tRNA modification enzymes, translation initiation and elongation factors, and ribosomal proteins. Phage CRISPR-Cas systems have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phage may repurpose bacterial CRISPR-Cas systems to eliminate competing phage. We phylogenetically define major clades of huge phage from human and other animal microbiomes, oceans, lakes, sediments, soils and the built environment. We conclude that their large gene inventories reflect a conserved biological strategy, observed over a broad bacterial host range and across Earth’s ecosystems.
biorxiv microbiology 200-500-users 2019Convergent gene loss in aquatic plants predicts new components of plant immunity and drought response, bioRxiv, 2019-03-11
AbstractThe transition of plants from sea to land sparked an arms race with pathogens. The increased susceptibility of land plants is largely thought to be due to their dependence on micro-organisms for nutrients; the ensuing co-evolution has shaped the plant immune system. By profiling the immune receptors across flowering plants, we identified species with low numbers of NLR immune receptors. Interestingly, four of these species represent distinct lineages of monocots and dicots that returned to the aquatic lifestyle. Both aquatic monocot and dicot species lost the same well-known downstream immune signalling complex (EDS1-PAD4). This observation inspired us to look for other genes with a similar loss pattern and allowed us to predict putative new components of plant immunity. Gene expression analyses confirmed that a group of these genes was differentially expressed under pathogen infection. Excitingly, another subset of these genes was differentially expressed upon drought. Collectively, our study reveals the minimal plant immune system required for life under water, and highlights additional components required for the life of land plants.Author summaryPlant resistance to pathogens is commonly mediated by a complex gene family, known as NLRs. Upon pathogen infection, changes in the cellular environment trigger NLR activation and subsequent defence responses. Despite the dependence of agricultural practices on NLR genes to control pathogen load, relatively little is known about this gene family outside of model crop species. In this study, we identified a convergent reduction in the NLR gene family among two lineages of aquatic plants. Furthermore, we established that NLR reduction occurred in conjunction with the loss of a common immune signalling pathway. Subsequently, we identified other genes convergently lost in aquatic species and propose these as candidate components of the plant immune signalling pathway. In addition, we revealed components of the agronomically important drought response to be lost in aquatic plants. This study adds to our understanding of the complex interactions between environment and response to biotic stress, widely known as the disease triangle. The pathways identified in this study shed further light on the link between responses to drought and disease.
biorxiv plant-biology 100-200-users 2019Convergent loss of an EDS1PAD4 signalling pathway in several plant lineages predicts new components of plant immunity and drought response, bioRxiv, 2019-03-11
AbstractPlant innate immunity relies on NLR receptors that recognize pathogen derived molecules and activate downstream signalling pathways. We analyzed the variation in copy number of NLR genes across flowering plants, and identified a number of species with a low number of NLRs relative to sister species. Two distinct lineages, one monocot (Lentibulariaceae) and one dicot (Alismatales) encapsulate four species with particularly few NLR genes. In these lineages, loss of NLRs coincided with loss of the well-known downstream immune signalling complex (EDS1-PAD4). When we expanded our analysis across the whole proteomes, we were able to identify other characterized immune genes absent only in Lentibulariaceae and Alismatales. Additionally, we identified a small subset of genes with unknown function convergently lost in all four species. We predicted that some of these genes may have a role in plant immunity. Gene expression analyses confirmed that a group of these genes was differentially expressed under pathogen infection. Another subset of these genes was differentially expressed upon drought providing further evidence of a link between the drought and plant immunity.
biorxiv plant-biology 100-200-users 2019