Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain by scGESTALT, bioRxiv, 2017-10-20
ABSTRACTHundreds of cell types are generated during development, but their lineage relationships are largely elusive. Here we report a technology, scGESTALT, which combines cell type identification by single-cell RNA sequencing with lineage recording by cumulative barcode editing. We sequenced ~60,000 transcriptomes from the juvenile zebrafish brain and identified more than 100 cell types and marker genes. We engineered an inducible system that combines early and late barcode editing and isolated thousands of single-cell transcriptomes and their associated barcodes. The large diversity of edited barcodes and cell types enabled the generation of lineage trees with hundreds of branches. Inspection of lineage trajectories identified restrictions at the level of cell types and brain regions and helped uncover gene expression cascades during differentiation. These results establish scGESTALT as a new and widely applicable tool to simultaneously characterize the molecular identities and lineage histories of thousands of cells during development and disease.
biorxiv developmental-biology 100-200-users 2017Strong binding activity of few transcription factors is a major determinant of open chromatin, bioRxiv, 2017-10-18
AbstractIt is well established that transcription factors (TFs) play crucial roles in determining cell identity, and that a large fraction of all TFs are expressed in most cell types. In order to globally characterize activities of TFs in cells, we have developed a novel massively parallel protein activity assay, Active TF Identification (ATI) that measures DNA-binding activity of all TFs from any species or tissue type. In contrast to previous studies based on mRNA expression or protein abundance, we found that a set of TFs binding to only around ten distinct motifs display strong DNA-binding activity in any given cell or tissue type. Mass spectrometric identification of TFs revealed that within these highly active TFs, there were both housekeeping TFs, which were universally found in all cell types, and specific TFs, which were highly enriched in known factors that determine the fate of the analyzed tissue or cell type. The importance of a small subset of TFs for determining the overall accessible chromatin landscape of a cell suggests that gene regulatory logic may be simpler than what has previously been appreciated.
biorxiv cell-biology 100-200-users 2017A Major Role for Common Genetic Variation in Anxiety Disorders, bioRxiv, 2017-10-17
AbstractAnxiety disorders are common, complex psychiatric disorders with twin heritabilities of 30-60%. We conducted a genome-wide association study of Lifetime Anxiety Disorder (n = 83 565) and an additional Current Anxiety Symptoms (n= 77 125) analysis. The liability scale common variant heritability estimate for Lifetime Anxiety Disorder was 26%, and for Current Anxiety Symptoms was 31%. Five novel genome-wide significant loci were identified including an intergenic region on chromosome 9 that has previously been associated with neuroticism, and a locus overlapping the BDNF receptor gene, NTRK2. Anxiety showed significant genetic correlations with depression and insomnia as well as coronary artery disease, mirroring findings from epidemiological studies. We conclude that common genetic variation accounts for a substantive proportion of the genetic architecture underlying anxiety.
biorxiv genetics 100-200-users 2017Cluster Headache Comparing Clustering Tools for 10X Single Cell Sequencing Data, bioRxiv, 2017-10-17
AbstractThe commercially available 10X Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10X Genomics data lack cell labels that would allow a direct performance evaluation. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with the 10X Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrate that all clustering methods tested clustered cells to a large degree according to the amount of genes coding for ribosomal protein genes in each cell.
biorxiv bioinformatics 100-200-users 2017Efficient generation of targeted large insertions in mouse embryos using 2C-HR-CRISPR, bioRxiv, 2017-10-17
Rapid and efficient generation of large fragment targeted knock-in mouse models is still a major hurdle in mouse genetics. Here we developed 2C-HR-CRISPR, a highly efficient gene editing method based on introducing CRISPR reagents into mouse embryos at the 2-cell stage, taking advantage of the likely increase in HR efficiency during the long G2 phase and open chromatin structure of the 2-cell embryo. With 2C-HR-CRISPR and a modified biotin-streptavidin approach to localize repair templates to target sites, we rapidly targeted 20 endogenous genes that are expressed in mouse blastocysts with fluorescent reporters and generated reporter mouse lines. We showcase the first live triple-color blastocyst with all three lineages differentially reported. Additionally, we demonstrated efficient double targeting, enabling rapid assessment of the auxin-inducible degradation system for probing protein function in mouse embryos. These methods open up exciting avenues for exploring cell fate decisions in the blastocyst and later stages of development. We also suggest that 2C-HR-CRISPR can be a better alternative to random transgenesis by ensuring transgene insertions at defined ‘safe harbor’ sites.
biorxiv genetics 100-200-users 2017Genomics in healthcare GA4GH looks to 2022, bioRxiv, 2017-10-16
AbstractThe Global Alliance for Genomics and Health (GA4GH), the standards-setting body in genomics for healthcare, aims to accelerate biomedical advancement globally. We describe the differences between healthcare- and research-driven genomics, discuss the implications of global, population-scale collections of human data for research, and outline mission-critical considerations in ethics, regulation, technology, data protection, and society. We present a crude model for estimating the rate of healthcare-funded genomes worldwide that accounts for the preparedness of each country for genomics, and infers a progression of cancer-related sequencing over time. We estimate that over 60 million patients will have their genome sequenced in a healthcare context by 2025. This represents a large technical challenge for healthcare systems, and a huge opportunity for research. We identify eight major practical, principled arguments to support the position that virtual cohorts of 100 million people or more would have tangible research benefits.
biorxiv genomics 100-200-users 2017