Distinguishing genetic correlation from causation across 52 diseases and complex traits, bioRxiv, 2017-10-19
AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline1.gif ><jatsinline-formula>α1α2) and E(<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline2.gif ><jatsinline-formula>α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large <jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=205435_inline3.gif ><jatsinline-formula>) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp > 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.
biorxiv genetics 200-500-users 2017Strong binding activity of few transcription factors is a major determinant of open chromatin, bioRxiv, 2017-10-18
AbstractIt is well established that transcription factors (TFs) play crucial roles in determining cell identity, and that a large fraction of all TFs are expressed in most cell types. In order to globally characterize activities of TFs in cells, we have developed a novel massively parallel protein activity assay, Active TF Identification (ATI) that measures DNA-binding activity of all TFs from any species or tissue type. In contrast to previous studies based on mRNA expression or protein abundance, we found that a set of TFs binding to only around ten distinct motifs display strong DNA-binding activity in any given cell or tissue type. Mass spectrometric identification of TFs revealed that within these highly active TFs, there were both housekeeping TFs, which were universally found in all cell types, and specific TFs, which were highly enriched in known factors that determine the fate of the analyzed tissue or cell type. The importance of a small subset of TFs for determining the overall accessible chromatin landscape of a cell suggests that gene regulatory logic may be simpler than what has previously been appreciated.
biorxiv cell-biology 100-200-users 2017A Major Role for Common Genetic Variation in Anxiety Disorders, bioRxiv, 2017-10-17
AbstractAnxiety disorders are common, complex psychiatric disorders with twin heritabilities of 30-60%. We conducted a genome-wide association study of Lifetime Anxiety Disorder (n = 83 565) and an additional Current Anxiety Symptoms (n= 77 125) analysis. The liability scale common variant heritability estimate for Lifetime Anxiety Disorder was 26%, and for Current Anxiety Symptoms was 31%. Five novel genome-wide significant loci were identified including an intergenic region on chromosome 9 that has previously been associated with neuroticism, and a locus overlapping the BDNF receptor gene, NTRK2. Anxiety showed significant genetic correlations with depression and insomnia as well as coronary artery disease, mirroring findings from epidemiological studies. We conclude that common genetic variation accounts for a substantive proportion of the genetic architecture underlying anxiety.
biorxiv genetics 100-200-users 2017Amplification-free, CRISPR-Cas9 Targeted Enrichment and SMRT Sequencing of Repeat-Expansion Disease Causative Genomic Regions, bioRxiv, 2017-10-17
AbstractTargeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods require amplification. Some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, many human genetic disorders are caused by repeat expansions, including difficult to sequence tandem repeats.We have developed a novel, amplification-free enrichment technique that employs the CRISPR-Cas9 system for specific targeting multiple genomic loci. This method, in conjunction with long reads generated through Single Molecule, Real-Time (SMRT) sequencing and unbiased coverage, enables enrichment and sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples, we demonstrate successful targeting of causative loci for Huntington’s disease (HTT; CAG repeat), Fragile X syndrome (FMR1; CGG repeat), amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (C9orf72; GGGGCC repeat), and spinocerebellar ataxia type 10 (SCA10) (ATXN10; variable ATTCT repeat). The method, amenable to multiplexing across multiple genomic loci, uses an amplification-free approach that facilitates the isolation of hundreds of individual on-target molecules in a single SMRT Cell and accurate sequencing through long repeat stretches, regardless of extreme GC percent or sequence complexity content. Our novel targeted sequencing method opens new doors to genomic analyses independent of PCR amplification that will facilitate the study of repeat expansion disorders.
biorxiv genomics 0-100-users 2017Cluster Headache Comparing Clustering Tools for 10X Single Cell Sequencing Data, bioRxiv, 2017-10-17
AbstractThe commercially available 10X Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10X Genomics data lack cell labels that would allow a direct performance evaluation. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with the 10X Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrate that all clustering methods tested clustered cells to a large degree according to the amount of genes coding for ribosomal protein genes in each cell.
biorxiv bioinformatics 100-200-users 2017Efficient generation of targeted large insertions in mouse embryos using 2C-HR-CRISPR, bioRxiv, 2017-10-17
Rapid and efficient generation of large fragment targeted knock-in mouse models is still a major hurdle in mouse genetics. Here we developed 2C-HR-CRISPR, a highly efficient gene editing method based on introducing CRISPR reagents into mouse embryos at the 2-cell stage, taking advantage of the likely increase in HR efficiency during the long G2 phase and open chromatin structure of the 2-cell embryo. With 2C-HR-CRISPR and a modified biotin-streptavidin approach to localize repair templates to target sites, we rapidly targeted 20 endogenous genes that are expressed in mouse blastocysts with fluorescent reporters and generated reporter mouse lines. We showcase the first live triple-color blastocyst with all three lineages differentially reported. Additionally, we demonstrated efficient double targeting, enabling rapid assessment of the auxin-inducible degradation system for probing protein function in mouse embryos. These methods open up exciting avenues for exploring cell fate decisions in the blastocyst and later stages of development. We also suggest that 2C-HR-CRISPR can be a better alternative to random transgenesis by ensuring transgene insertions at defined ‘safe harbor’ sites.
biorxiv genetics 100-200-users 2017