Whole-genome deep learning analysis reveals causal role of noncoding mutations in autism, bioRxiv, 2018-05-11
AbstractWe address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts specific regulatory effects and deleterious disease impact of genetic variants. Applying this framework to 1,790 Autism Spectrum Disorder (ASD) simplex families reveals autism disease causality of noncoding mutations by demonstrating that ASD probands harbor transcriptional (TRDs) and post-transcriptional (RRDs) regulation-disrupting mutations of significantly higher functional impact than unaffected siblings. Importantly, we detect this significant noncoding contribution at each level, transcriptional and post-transcriptional, independently and after multiple hypothesis correction. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development, and reveals a convergent genetic landscape of coding and noncoding (TRD and RRD) de novo mutations in ASD. We demonstrate that sequences carrying prioritized proband de novo mutations possess transcriptional regulatory activity and drive expression differentially, and highlight a link between noncoding mutations and IQ heterogeneity in ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD, prioritizes high impact transcriptional and post-transcriptional regulatory mutations for further study, and is broadly applicable to complex human diseases.
biorxiv genomics 100-200-users 2018Portraits of genetic intra-tumour heterogeneity and subclonal selection across cancer types, bioRxiv, 2018-05-05
SummaryOngoing cancer evolution gives rise to intra-tumour heterogeneity (ITH), which is a major mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin and drivers of ITH across cancer types are poorly understood. Here, we extensively characterise ITH across 2,778 cancer whole genome sequences from 36 cancer types. We demonstrate that nearly all tumours (94.7%) with sufficient sequencing depth contain evidence of recent subclonal expansions, and that most cancer types show clear signs of positive selection in both clonal and subclonal protein coding variants. We find distinctive subclonal patterns of driver gene mutations, fusions, structural variation and copy-number alterations across cancer types. Dynamic, tumour type-specific changes of mutational processes between subclonal expansions shape differences between clonal and subclonal events. Our results underline the importance of ITH and its drivers in tumour evolution, and provide an unprecedented pan-cancer resource of extensively annotated subclonal events, laying a foundation for future cancer genomic studies.
biorxiv cancer-biology 100-200-users 2018CRISPR-Cas9 interference in cassava linked to the evolution of editing-resistant geminiviruses, bioRxiv, 2018-05-04
ABSTRACTWe used CRISPR-Cas9 in the staple food crop cassava with the aim of engineering resistance to African cassava mosaic virus, a member of a widespread and important family of plant-pathogenic DNA viruses. We found that between 33 and 48% of edited virus genomes evolved a conserved single-nucleotide mutation that confers resistance to CRISPR-Cas9 cleavage. Our study highlights the potential for virus escape from this technology. Care should be taken to design CRISPR-Cas9 experiments that minimize the risk of virus escape.
biorxiv plant-biology 100-200-users 2018Human 5′ UTR design and variant effect prediction from a massively parallel translation assay, bioRxiv, 2018-04-29
Predicting the impact of cis-regulatory sequence on gene expression is a foundational challenge for biology. We combine polysome profiling of hundreds of thousands of randomized 5′ UTRs with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately target specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,000 truncated human 5′ UTRs and 3,577 naturally-occurring variants and show that the model accurately predicts ribosome loading of these sequences. Finally, we provide evidence of 47 SNVs associated with human diseases that cause a significant change in ribosome loading and thus a plausible molecular basis for disease.
biorxiv synthetic-biology 100-200-users 2018Clairvoyante a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing, bioRxiv, 2018-04-28
AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comaquaskylineClairvoyante>httpsgithub.comaquaskylineClairvoyante<jatsext-link>), with modules to train, utilize and visualize the model.
biorxiv bioinformatics 100-200-users 2018Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits, bioRxiv, 2018-04-21
AbstractMethods for using GWAS to estimate genetic correlations between pairwise combinations of traits have produced “atlases” of genetic architecture. Genetic atlases reveal pervasive pleiotropy, and genome-wide significant loci are often shared across different phenotypes. We introduce genomic structural equation modeling (Genomic SEM), a multivariate method for analyzing the joint genetic architectures of complex traits. Using formal methods for modeling covariance structure, Genomic SEM synthesizes genetic correlations and SNP-heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Finally, Genomic SEM can be used to identify loci that cause divergence between traits, aiding the search for what uniquely differentiates highly correlated phenotypes. We demonstrate several applications of Genomic SEM, including a joint analysis of GWAS summary statistics from five genetically correlated psychiatric traits. We identify 27 independent SNPs not previously identified in the univariate GWASs, 5 of which have been reported in other published GWASs of the included traits. Polygenic scores derived from Genomic SEM consistently outperform polygenic scores derived from GWASs of the individual traits. Genomic SEM is flexible, open ended, and allows for continuous innovations in how multivariate genetic architecture is modeled.
biorxiv genetics 100-200-users 2018