Salmonella entericagenomes recovered from victims of a major 16th century epidemic in Mexico, bioRxiv, 2017-02-09
AbstractIndigenous populations of the Americas experienced high mortality rates during the early contact period as a result of infectious diseases, many of which were introduced by Europeans. Most of the pathogenic agents that caused these outbreaks remain unknown. Using a metagenomic tool called MALT to search for traces of ancient pathogen DNA, we were able to identifySalmonella entericain individuals buried in an early contact era epidemic cemetery at Teposcolula-Yucundaa, Oaxaca in southern Mexico. This cemetery is linked to the 1545-1550 CE epidemic locally known as “cocoliztli”, the cause of which has been debated for over a century. Here we present two reconstructed ancient genomes forSalmonella entericasubsp.entericaserovar Paratyphi C, a bacterial cause of enteric fever. We propose thatS.Paratyphi C contributed to the population decline during the 1545cocoliztlioutbreak in Mexico.One Sentence SummaryGenomic evidence of enteric fever identified in an indigenous population from early contact period Mexico.
biorxiv genomics 0-100-users 2017Quantitative analysis of population-scale family trees using millions of relatives, bioRxiv, 2017-02-08
AbstractFamily trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data from genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights to population genetics theories on the dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data.One Sentence SummaryUsing massive crowd-sourced genealogy data, we created a population-scale family tree resource for scientific studies.
biorxiv genomics 100-200-users 2017Genomic analysis of family data reveals additional genetic effects on intelligence and personality, bioRxiv, 2017-02-07
AbstractPedigree-based analyses of intelligence have reported that genetic differences account for 50-80% of the phenotypic variation. For personality traits these effects are smaller, with 34-48% of the variance being explained by genetic differences. However, molecular genetic studies using unrelated individuals typically report a heritability estimate of around 30% for intelligence and between 0% and 15% for personality variables. Pedigree-based estimates and molecular genetic estimates may differ because current genotyping platforms are poor at tagging causal variants, variants with low minor allele frequency, copy number variants, and structural variants. Using ∼20 000 individuals in the Generation Scotland family cohort genotyped for ∼700 000 single nucleotide polymorphisms (SNPs), we exploit the high levels of linkage disequilibrium (LD) found in members of the same family to quantify the total effect of genetic variants that are not tagged in GWASs of unrelated individuals. In our models, genetic variants in low LD with genotyped SNPs explain over half of the genetic variance in intelligence, education, and neuroticism. By capturing these additional genetic effects our models closely approximate the heritability estimates from twin studies for intelligence and education, but not for neuroticism and extraversion. We then replicated our finding using imputed molecular genetic data from unrelated individuals to show that ∼50% of differences in intelligence, and ∼40% of the differences in education, can be explained by genetic effects when a larger number of rare SNPs are included. From an evolutionary genetic perspective, a substantial contribution of rare genetic variants to individual differences in intelligence and education is consistent with mutation-selection balance.
biorxiv genetics 200-500-users 2017Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing, bioRxiv, 2017-02-03
AbstractConventional methods for profiling the molecular content of biological samples fail to resolve heterogeneity that is present at the level of single cells. In the past few years, single cell RNA sequencing has emerged as a powerful strategy for overcoming this challenge. However, its adoption has been limited by a paucity of methods that are at once simple to implement and cost effective to scale massively. Here, we describe a combinatorial indexing strategy to profile the transcriptomes of large numbers of single cells or single nuclei without requiring the physical isolation of each cell (Single cell Combinatorial Indexing RNA-seq or sci-RNA-seq). We show that sci-RNA-seq can be used to efficiently profile the transcriptomes of tens-of-thousands of single cells per experiment, and demonstrate that we can stratify cell types from these data. Key advantages of sci-RNA-seq over contemporary alternatives such as droplet-based single cell RNA-seq include sublinear cost scaling, a reliance on widely available reagents and equipment, the ability to concurrently process many samples within a single workflow, compatibility with methanol fixation of cells, cell capture based on DNA content rather than cell size, and the flexibility to profile either cells or nuclei. As a demonstration of sci-RNA-seq, we profile the transcriptomes of 42,035 single cells from C. elegans at the L2 stage, effectively 50-fold “shotgun cellular coverage” of the somatic cell composition of this organism at this stage. We identify 27 distinct cell types, including rare cell types such as the two distal tip cells of the developing gonad, estimate consensus expression profiles and define cell-type specific and selective genes. Given that C. elegans is the only organism with a fully mapped cellular lineage, these data represent a rich resource for future methods aimed at defining cell types and states. They will advance our understanding of developmental biology, and constitute a major step towards a comprehensive, single-cell molecular atlas of a whole animal.
biorxiv genomics 200-500-users 2017Developmental diversification of cortical inhibitory interneurons, bioRxiv, 2017-02-03
ABSTRACTDiverse subsets of cortical interneurons play a particularly important role in the stability of the neural circuits underlying cognitive and higher order brain functions, yet our understanding of how this diversity is generated is far from complete. We applied massively parallel single-cell RNA-seq to profile a developmental time course of interneuron development, measuring the transcriptomes of over 60,000 progenitors during their maturation in the ganglionic eminences and embryonic migration into the cortex. While diversity within mitotic progenitors is largely driven by cell cycle and differentiation state, we observed sparse eminence-specific transcription factor expression, which seeds the emergence of later cell diversity. Upon becoming postmitotic, cells from all eminences pass through one of three precursor states, one of which represents a cortical interneuron ground state. By integrating datasets across developmental timepoints, we identified transcriptomic heterogeneity in interneuron precursors representing the emergence of four cardinal classes (Pvalb, Sst, Id2 and Vip), which further separate into subtypes at different timepoints during development. Our analysis revealed that the ASD-associated transcription factor Mef2c discriminates early Pvalb-precursors in E13.5 cells, and removal of Mef2c confirms its essential role for Pvalb interneuron development. These findings shed new light on the molecular diversification of early inhibitory precursors, and suggest gene modules that may link developmental specification with the etiology of neuropsychiatric disorders.
biorxiv neuroscience 100-200-users 2017Scaling single cell transcriptomics through split pool barcoding, bioRxiv, 2017-02-03
Constructing an atlas of cell types in complex organisms will require a collective effort to characterize billions of individual cells. Single cell RNA sequencing (scRNA-seq) has emerged as the main tool for characterizing cellular diversity, but current methods use custom microfluidics or microwells to compartmentalize single cells, limiting scalability and widespread adoption. Here we present Split Pool Ligation-based Transcriptome sequencing (SPLiT-seq), a scRNA-seq method that labels the cellular origin of RNA through combinatorial indexing. SPLiT-seq is compatible with fixed cells, scales exponentially, uses only basic laboratory equipment, and costs one cent per cell. We used this approach to analyze 109,069 single cell transcriptomes from an entire postnatal day 5 mouse brain, providing the first global snapshot at this stage of development. We identified 13 main populations comprising different types of neurons, glia, immune cells, endothelia, as well as types in the blood-brain-barrier. Moreover, we resolve substructure within these clusters corresponding to cells at different stages of development. As sequencing capacity increases, SPLiT-seq will enable profiling of billions of cells in a single experiment.
biorxiv genomics 100-200-users 2017