Minimal phenotyping yields GWAS hits of reduced specificity for major depression, bioRxiv, 2018-10-11
AbstractMinimal phenotyping refers to the reliance on the use of a small number of self-report items for disease case identification. This strategy has been applied to genome-wide association studies (GWAS) of major depressive disorder (MDD). Here we report that the genotype derived heritability (h2SNP) of depression defined by minimal phenotyping (14%, SE = 0.8%) is lower than strictly defined MDD (26%, SE = 2.2%). This cannot be explained by differences in prevalence between definitions or including cases of lower liability to MDD in minimal phenotyping definitions of depression, but can be explained by misdiagnosis of those without depression or with related conditions as cases of depression. Depression defined by minimal phenotyping is as genetically correlated with strictly defined MDD (rG = 0.81, SE = 0.03) as it is with the personality trait neuroticism (rG = 0.84, SE = 0.05), a trait not defined by the cardinal symptoms of depression. While they both show similar shared genetic liability with neuroticism, a greater proportion of the genome contributes to the minimal phenotyping definitions of depression (80.2%, SE = 0.6%) than to strictly defined MDD (65.8%, SE = 0.6%). We find that GWAS loci identified in minimal phenotyping definitions of depression are not specific to MDD they also predispose to other psychiatric conditions. Finally, while highly predictive polygenic risk scores can be generated from minimal phenotyping definitions of MDD, the predictive power can be explained entirely by the sample size used to generate the polygenic risk score, rather than specificity for MDD. Our results reveal that genetic analysis of minimal phenotyping definitions of depression identifies non-specific genetic factors shared between MDD and other psychiatric conditions. Reliance on results from minimal phenotyping for MDD may thus bias views of the genetic architecture of MDD and may impede our ability to identify pathways specific to MDD.
biorxiv genetics 100-200-users 2018Population-scale proteome variation in human induced pluripotent stem cells, bioRxiv, 2018-10-11
AbstractRealising the potential of human induced pluripotent stem cell (iPSC) technology for drug discovery, disease modelling and cell therapy requires an understanding of variability across iPSC lines. While previous studies have characterized iPS cell lines genetically and transcriptionally, little is known about the variability of the iPSC proteome. Here, we present the first comprehensive proteomic iPSC dataset, analysing 202 iPSC lines derived from 151 donors. We characterise the major genetic determinants affecting proteome and transcriptome variation across iPSC lines and identify key regulatory mechanisms affecting variation in protein abundance. Our data identified >700 human iPSC protein quantitative trait loci (pQTLs). We mapped trans regulatory effects, identifying an important role for protein-protein interactions. We discovered that pQTLs show increased enrichment in disease-linked GWAS variants, compared with RNA-based eQTLs.
biorxiv genomics 0-100-users 2018Selene a PyTorch-based deep learning library for biological sequence-level data, bioRxiv, 2018-10-10
AbstractTo enable the application of deep learning in biology, we present Selene (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsselene.flatironinstitute.org>httpsselene.flatironinstitute.org<jatsext-link>), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequences. We demonstrate how Selene allows researchers to easily train a published architecture on new data, develop and evaluate a new architecture, and use a trained model to answer biological questions of interest.
biorxiv bioinformatics 100-200-users 2018Whole genome sequencing enables definitive diagnosis of Cystic Fibrosis and Primary Ciliary Dyskinesia, bioRxiv, 2018-10-10
AbstractUnderstanding the genomic basis of inherited respiratory disorders can assist in the clinical management of individuals with these rare disorders. We apply whole genome sequencing for the discovery of disease-causing variants in the non-coding regions of known disease genes for two individuals with inherited respiratory disorders. We describe analysis strategies to pinpoint candidate non-coding variants within the non-coding genome and demonstrate aberrant RNA splicing as a result of deep intronic variants in DNAH11 and CFTR. These findings confirm clinical diagnoses of primary ciliary dyskinesia and cystic fibrosis, respectively.
biorxiv genomics 0-100-users 2018Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION, bioRxiv, 2018-10-09
AbstractTandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data.We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects andor inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length.In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.
biorxiv genomics 0-100-users 2018Altered chromatin localization of hybrid lethality proteins in Drosophila, bioRxiv, 2018-10-09
AbstractUnderstanding hybrid incompatibilities is a fundamental pursuit in evolutionary genetics. In crosses between Drosophila melanogaster females and Drosophila simulans males, the interaction of at least three genes is necessary for hybrid male lethality Hmr mel, Lhr sim, and gfzf sim. All three hybrid incompatibility genes are chromatin associated factors. While HMR and LHR physically bind each other and function together in a single complex, the connection between either of these proteins and gfzf remains mysterious. Here, we investigate the allele specific chromatin binding patterns of gfzf. First, our cytological analyses show that there is little difference in protein localization of GFZF between the two species except at telomeric sequences. In particular, GFZF binds the telomeric retrotransposon repeat arrays, and the differential binding of GFZF at telomeres reflects the rapid changes in sequence composition at telomeres between D. melanogaster and D. simulans. Second, we investigate the patterns of GFZF and HMR co-localization and find that the two proteins do not normally co-localize in D. melanogaster. However, in inter-species hybrids, HMR shows extensive mis-localization to GFZF sites, and this altered localization requires the presence of gfzf sim. Third, we find by ChIP-Seq that over-expression of HMR and LHR within species is sufficient to cause HMR to mis-localize to GFZF binding sites, indicating that HMR has a natural low affinity for GFZF sites. Together, these studies provide the first insights into the different properties of gfzf between D. melanogaster and D. simulans as well as a molecular interaction between gfzf and Hmr in the form of altered protein localization.
biorxiv molecular-biology 0-100-users 2018