Whole genome sequencing enables definitive diagnosis of Cystic Fibrosis and Primary Ciliary Dyskinesia, bioRxiv, 2018-10-10
AbstractUnderstanding the genomic basis of inherited respiratory disorders can assist in the clinical management of individuals with these rare disorders. We apply whole genome sequencing for the discovery of disease-causing variants in the non-coding regions of known disease genes for two individuals with inherited respiratory disorders. We describe analysis strategies to pinpoint candidate non-coding variants within the non-coding genome and demonstrate aberrant RNA splicing as a result of deep intronic variants in DNAH11 and CFTR. These findings confirm clinical diagnoses of primary ciliary dyskinesia and cystic fibrosis, respectively.
biorxiv genomics 0-100-users 2018Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION, bioRxiv, 2018-10-09
AbstractTandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data.We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects andor inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length.In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.
biorxiv genomics 0-100-users 2018Altered chromatin localization of hybrid lethality proteins in Drosophila, bioRxiv, 2018-10-09
AbstractUnderstanding hybrid incompatibilities is a fundamental pursuit in evolutionary genetics. In crosses between Drosophila melanogaster females and Drosophila simulans males, the interaction of at least three genes is necessary for hybrid male lethality Hmr mel, Lhr sim, and gfzf sim. All three hybrid incompatibility genes are chromatin associated factors. While HMR and LHR physically bind each other and function together in a single complex, the connection between either of these proteins and gfzf remains mysterious. Here, we investigate the allele specific chromatin binding patterns of gfzf. First, our cytological analyses show that there is little difference in protein localization of GFZF between the two species except at telomeric sequences. In particular, GFZF binds the telomeric retrotransposon repeat arrays, and the differential binding of GFZF at telomeres reflects the rapid changes in sequence composition at telomeres between D. melanogaster and D. simulans. Second, we investigate the patterns of GFZF and HMR co-localization and find that the two proteins do not normally co-localize in D. melanogaster. However, in inter-species hybrids, HMR shows extensive mis-localization to GFZF sites, and this altered localization requires the presence of gfzf sim. Third, we find by ChIP-Seq that over-expression of HMR and LHR within species is sufficient to cause HMR to mis-localize to GFZF binding sites, indicating that HMR has a natural low affinity for GFZF sites. Together, these studies provide the first insights into the different properties of gfzf between D. melanogaster and D. simulans as well as a molecular interaction between gfzf and Hmr in the form of altered protein localization.
biorxiv molecular-biology 0-100-users 2018Analyses of Neanderthal introgression suggest that Levantine and southern Arabian populations have a shared population history, bioRxiv, 2018-10-09
AbstractObjectivesModern humans are thought to have interbred with Neanderthals in the Near East soon after modern humans dispersed out of Africa. This introgression event likely took place in either the Levant or southern Arabian depending on which dispersal route out of Africa was followed. In this study, we compare Neanderthal introgression in contemporary Levantine and southern Arabian populations to investigate Neanderthal introgression and to study Near Eastern population history.Materials and MethodsWe analyzed genotyping data on >400,000 autosomal SNPs from seven Levantine and five southern Arabian populations and compared those data to populations from around the world including Neanderthal and Denisovan genomes. We used f4 and D statistics to estimate and compare levels of Neanderthal introgression between Levantine, southern Arabian, and comparative global populations. We also identified 1,581 putative Neanderthal-introgressed SNPs within our dataset and analyzed their allele frequencies as a means to compare introgression patterns in Levantine and southern Arabian genomes.ResultsWe find that Levantine and southern Arabian populations have similar levels of Neanderthal introgression to each other but lower levels than other non-Africans. Furthermore, we find that introgressed SNPs have very similar allele frequencies in the Levant and southern Arabia, which indicates that Neanderthal introgression is similarly distributed in Levantine and southern Arabian genomes.DiscussionWe infer that the ancestors of contemporary Levantine and southern Arabian populations received Neanderthal introgression prior to separating from each other and that there has been extensive gene flow between these populations.
biorxiv genomics 0-100-users 2018Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self-Organizing Maps, bioRxiv, 2018-10-09
AbstractRapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq and scRNA-seq data that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of single-cells.
biorxiv genomics 0-100-users 2018Genetic variability in response to Aβ deposition influences Alzheimer’s risk, bioRxiv, 2018-10-09
AbstractGenetic analysis of late-onset Alzheimer’s disease risk has previously identified a network of largely microglial genes that form a transcriptional network. In transgenic mouse models of amyloid deposition we have previously shown that the expression of many of the mouse orthologs of these genes are co-ordinately up-regulated by amyloid deposition. Here we investigate whether systematic analysis of other members of this mouse amyloid-responsive network predicts other Alzheimer’s risk loci. This statistical comparison of the mouse amyloid-response network with Alzheimer’s disease genome-wide association studies identifies 5 other genetic risk loci for the disease (OAS1, CXCL10, LAPTM5, ITGAM and LILRB4). This work suggests that genetic variability in the microglial response to amyloid deposition is a major determinant for Alzheimer’s risk.One Sentence SummaryIdentification of 5 new risk loci for Alzheimer’s by statistical comparison of mouse Aβ microglial response with gene-based SNPs from human GWAS
biorxiv neuroscience 0-100-users 2018