Genomic analysis of family data reveals additional genetic effects on intelligence and personality, bioRxiv, 2017-02-07
AbstractPedigree-based analyses of intelligence have reported that genetic differences account for 50-80% of the phenotypic variation. For personality traits these effects are smaller, with 34-48% of the variance being explained by genetic differences. However, molecular genetic studies using unrelated individuals typically report a heritability estimate of around 30% for intelligence and between 0% and 15% for personality variables. Pedigree-based estimates and molecular genetic estimates may differ because current genotyping platforms are poor at tagging causal variants, variants with low minor allele frequency, copy number variants, and structural variants. Using ∼20 000 individuals in the Generation Scotland family cohort genotyped for ∼700 000 single nucleotide polymorphisms (SNPs), we exploit the high levels of linkage disequilibrium (LD) found in members of the same family to quantify the total effect of genetic variants that are not tagged in GWASs of unrelated individuals. In our models, genetic variants in low LD with genotyped SNPs explain over half of the genetic variance in intelligence, education, and neuroticism. By capturing these additional genetic effects our models closely approximate the heritability estimates from twin studies for intelligence and education, but not for neuroticism and extraversion. We then replicated our finding using imputed molecular genetic data from unrelated individuals to show that ∼50% of differences in intelligence, and ∼40% of the differences in education, can be explained by genetic effects when a larger number of rare SNPs are included. From an evolutionary genetic perspective, a substantial contribution of rare genetic variants to individual differences in intelligence and education is consistent with mutation-selection balance.
biorxiv genetics 200-500-users 2017Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing, bioRxiv, 2017-02-03
AbstractConventional methods for profiling the molecular content of biological samples fail to resolve heterogeneity that is present at the level of single cells. In the past few years, single cell RNA sequencing has emerged as a powerful strategy for overcoming this challenge. However, its adoption has been limited by a paucity of methods that are at once simple to implement and cost effective to scale massively. Here, we describe a combinatorial indexing strategy to profile the transcriptomes of large numbers of single cells or single nuclei without requiring the physical isolation of each cell (Single cell Combinatorial Indexing RNA-seq or sci-RNA-seq). We show that sci-RNA-seq can be used to efficiently profile the transcriptomes of tens-of-thousands of single cells per experiment, and demonstrate that we can stratify cell types from these data. Key advantages of sci-RNA-seq over contemporary alternatives such as droplet-based single cell RNA-seq include sublinear cost scaling, a reliance on widely available reagents and equipment, the ability to concurrently process many samples within a single workflow, compatibility with methanol fixation of cells, cell capture based on DNA content rather than cell size, and the flexibility to profile either cells or nuclei. As a demonstration of sci-RNA-seq, we profile the transcriptomes of 42,035 single cells from C. elegans at the L2 stage, effectively 50-fold “shotgun cellular coverage” of the somatic cell composition of this organism at this stage. We identify 27 distinct cell types, including rare cell types such as the two distal tip cells of the developing gonad, estimate consensus expression profiles and define cell-type specific and selective genes. Given that C. elegans is the only organism with a fully mapped cellular lineage, these data represent a rich resource for future methods aimed at defining cell types and states. They will advance our understanding of developmental biology, and constitute a major step towards a comprehensive, single-cell molecular atlas of a whole animal.
biorxiv genomics 200-500-users 2017Developmental diversification of cortical inhibitory interneurons, bioRxiv, 2017-02-03
ABSTRACTDiverse subsets of cortical interneurons play a particularly important role in the stability of the neural circuits underlying cognitive and higher order brain functions, yet our understanding of how this diversity is generated is far from complete. We applied massively parallel single-cell RNA-seq to profile a developmental time course of interneuron development, measuring the transcriptomes of over 60,000 progenitors during their maturation in the ganglionic eminences and embryonic migration into the cortex. While diversity within mitotic progenitors is largely driven by cell cycle and differentiation state, we observed sparse eminence-specific transcription factor expression, which seeds the emergence of later cell diversity. Upon becoming postmitotic, cells from all eminences pass through one of three precursor states, one of which represents a cortical interneuron ground state. By integrating datasets across developmental timepoints, we identified transcriptomic heterogeneity in interneuron precursors representing the emergence of four cardinal classes (Pvalb, Sst, Id2 and Vip), which further separate into subtypes at different timepoints during development. Our analysis revealed that the ASD-associated transcription factor Mef2c discriminates early Pvalb-precursors in E13.5 cells, and removal of Mef2c confirms its essential role for Pvalb interneuron development. These findings shed new light on the molecular diversification of early inhibitory precursors, and suggest gene modules that may link developmental specification with the etiology of neuropsychiatric disorders.
biorxiv neuroscience 100-200-users 2017Scaling single cell transcriptomics through split pool barcoding, bioRxiv, 2017-02-03
Constructing an atlas of cell types in complex organisms will require a collective effort to characterize billions of individual cells. Single cell RNA sequencing (scRNA-seq) has emerged as the main tool for characterizing cellular diversity, but current methods use custom microfluidics or microwells to compartmentalize single cells, limiting scalability and widespread adoption. Here we present Split Pool Ligation-based Transcriptome sequencing (SPLiT-seq), a scRNA-seq method that labels the cellular origin of RNA through combinatorial indexing. SPLiT-seq is compatible with fixed cells, scales exponentially, uses only basic laboratory equipment, and costs one cent per cell. We used this approach to analyze 109,069 single cell transcriptomes from an entire postnatal day 5 mouse brain, providing the first global snapshot at this stage of development. We identified 13 main populations comprising different types of neurons, glia, immune cells, endothelia, as well as types in the blood-brain-barrier. Moreover, we resolve substructure within these clusters corresponding to cells at different stages of development. As sequencing capacity increases, SPLiT-seq will enable profiling of billions of cells in a single experiment.
biorxiv genomics 100-200-users 2017High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing, bioRxiv, 2017-02-02
AbstractAccurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.Abbreviations<jatsdef-list><jatsdef-item>bpbase pair<jatsdef-item><jatsdef-item>FLfull length<jatsdef-item><jatsdef-item>ntnucleotide<jatsdef-item><jatsdef-item>ROIread of insert, i.e. PacBio read<jatsdef-item><jatsdef-item>SJsplice junction<jatsdef-item><jatsdef-item>SMRTsingle-molecule real-time<jatsdef-item><jatsdef-item>TMtranscript model<jatsdef-item><jatsdef-list>
biorxiv genomics 0-100-users 2017SvABA Genome-wide detection of structural variants and indels by local assembly, bioRxiv, 2017-02-02
AbstractStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at-scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.
biorxiv genomics 0-100-users 2017