Population-scale proteome variation in human induced pluripotent stem cells, bioRxiv, 2018-10-11
AbstractRealising the potential of human induced pluripotent stem cell (iPSC) technology for drug discovery, disease modelling and cell therapy requires an understanding of variability across iPSC lines. While previous studies have characterized iPS cell lines genetically and transcriptionally, little is known about the variability of the iPSC proteome. Here, we present the first comprehensive proteomic iPSC dataset, analysing 202 iPSC lines derived from 151 donors. We characterise the major genetic determinants affecting proteome and transcriptome variation across iPSC lines and identify key regulatory mechanisms affecting variation in protein abundance. Our data identified >700 human iPSC protein quantitative trait loci (pQTLs). We mapped trans regulatory effects, identifying an important role for protein-protein interactions. We discovered that pQTLs show increased enrichment in disease-linked GWAS variants, compared with RNA-based eQTLs.
biorxiv genomics 0-100-users 2018Whole genome sequencing enables definitive diagnosis of Cystic Fibrosis and Primary Ciliary Dyskinesia, bioRxiv, 2018-10-10
AbstractUnderstanding the genomic basis of inherited respiratory disorders can assist in the clinical management of individuals with these rare disorders. We apply whole genome sequencing for the discovery of disease-causing variants in the non-coding regions of known disease genes for two individuals with inherited respiratory disorders. We describe analysis strategies to pinpoint candidate non-coding variants within the non-coding genome and demonstrate aberrant RNA splicing as a result of deep intronic variants in DNAH11 and CFTR. These findings confirm clinical diagnoses of primary ciliary dyskinesia and cystic fibrosis, respectively.
biorxiv genomics 0-100-users 2018Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION, bioRxiv, 2018-10-09
AbstractTandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data.We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects andor inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length.In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.
biorxiv genomics 0-100-users 2018Analyses of Neanderthal introgression suggest that Levantine and southern Arabian populations have a shared population history, bioRxiv, 2018-10-09
AbstractObjectivesModern humans are thought to have interbred with Neanderthals in the Near East soon after modern humans dispersed out of Africa. This introgression event likely took place in either the Levant or southern Arabian depending on which dispersal route out of Africa was followed. In this study, we compare Neanderthal introgression in contemporary Levantine and southern Arabian populations to investigate Neanderthal introgression and to study Near Eastern population history.Materials and MethodsWe analyzed genotyping data on >400,000 autosomal SNPs from seven Levantine and five southern Arabian populations and compared those data to populations from around the world including Neanderthal and Denisovan genomes. We used f4 and D statistics to estimate and compare levels of Neanderthal introgression between Levantine, southern Arabian, and comparative global populations. We also identified 1,581 putative Neanderthal-introgressed SNPs within our dataset and analyzed their allele frequencies as a means to compare introgression patterns in Levantine and southern Arabian genomes.ResultsWe find that Levantine and southern Arabian populations have similar levels of Neanderthal introgression to each other but lower levels than other non-Africans. Furthermore, we find that introgressed SNPs have very similar allele frequencies in the Levant and southern Arabia, which indicates that Neanderthal introgression is similarly distributed in Levantine and southern Arabian genomes.DiscussionWe infer that the ancestors of contemporary Levantine and southern Arabian populations received Neanderthal introgression prior to separating from each other and that there has been extensive gene flow between these populations.
biorxiv genomics 0-100-users 2018Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self-Organizing Maps, bioRxiv, 2018-10-09
AbstractRapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq and scRNA-seq data that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of single-cells.
biorxiv genomics 0-100-users 2018A mouse tissue atlas of small non-coding RNA, bioRxiv, 2018-09-29
SUMMARYSmall non-coding RNAs (ncRNAs) play a vital role in a broad range of biological processes both in health and disease. A comprehensive quantitative reference of small ncRNA expression would significantly advance our understanding of ncRNA roles in shaping tissue functions. Here, we systematically profiled the levels of five ncRNA classes (miRNA, snoRNA, snRNA, scaRNA and tRNA fragments) across eleven mouse tissues by deep sequencing. Using fourteen biological replicates spanning both sexes, we identified that ~ 30% of small ncRNAs are distributed across the body in a tissue-specific manner with some are also being sexually dimorphic. We found that miRNAs are subject to “arm switching” between healthy tissues and that tRNA fragments are retained within tissues in both a gene- and a tissue-specific manner. Out of eleven profiled tissues we confirmed that brain contains the largest number of unique small ncRNA transcripts, some of which were previously annotated while others are identified for the first time in this study. Furthermore, by combining these findings with single-cell ATAC-seq data, we were able to connect identified brain-specific ncRNA with their cell types of origin. These results yield the most comprehensive characterization of specific and ubiquitous small RNAs in individual murine tissues to date, and we expect that this data will be a resource for the further identification of ncRNAs involved in tissue-function in health and dysfunction in disease.HIGHLIGHTS<jatslist list-type=simple><jatslist-item>-An atlas of tissue levels of multiple small ncRNA classes generated from 14 biological replicates of both sexes across 11 tissues<jatslist-item><jatslist-item>-Distinct distribution patterns of miRNA arms and tRNA fragments across tissues suggest the existence of tissue-specific mechanisms of ncRNA cleavage and retention<jatslist-item><jatslist-item>-miRNA expression is sex specific in healthy tissues<jatslist-item><jatslist-item>-Small RNA-seq and scATAC-seq data integration produce a detailed map of cell-type specific ncRNA profiles in the mouse brain<jatslist-item>
biorxiv genomics 0-100-users 2018