Substantial Batch Effects in TCGA Exome Sequences Undermine Pan-Cancer Analysis of Germline Variants, bioRxiv, 2018-10-16
ABSTRACTBackgroundIn recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from >10,000 patients.MethodsOur hypothesis in this study is that whole exome sequences from healthy blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2,241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity.ResultsWe report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants.ConclusionTCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.
biorxiv cancer-biology 0-100-users 2018Harnessing the Anti-Cancer Natural Product Nimbolide for Targeted Protein Degradation, bioRxiv, 2018-10-15
AbstractNimbolide, a terpenoid natural product derived from the Neem tree, impairs cancer pathogenicity across many types of human cancers; however, the direct targets and mechanisms by which nimbolide exerts its effects are poorly understood. Here, we used activity-based protein profiling (ABPP) chemoproteomic platforms to discover that nimbolide reacts with a novel functional cysteine crucial for substrate recognition in the E3 ubiquitin ligase RNF114. Nimbolide impairs breast cancer cell proliferation in-part by disrupting RNF114 substrate recognition, leading to inhibition of ubiquitination and degradation of the tumor-suppressors such as p21, resulting in their rapid stabilization. We further demonstrate that nimbolide can be harnessed to recruit RNF114 as an E3 ligase in targeted protein degradation applications and show that synthetically simpler scaffolds are also capable of accessing this unique reactive site. Our study highlights the utility of ABPP platforms in uncovering unique druggable modalities accessed by natural products for cancer therapy and targeted protein degradation applications.
biorxiv cancer-biology 0-100-users 2018Pan-cancer whole genome analyses of metastatic solid tumors, bioRxiv, 2018-09-21
AbstractMetastatic cancer is one of the major causes of death and is associated with poor treatment efficiency. A better understanding of the characteristics of late stage cancer is required to help tailor personalised treatment, reduce overtreatment and improve outcomes. Here we describe the largest pan-cancer study of metastatic solid tumor genomes, including 2,520 whole genome-sequenced tumor-normal pairs, analyzed at a median depth of 106x and 38x respectively, and surveying over 70 million somatic variants. Metastatic lesions were found to be very diverse, with mutation characteristics reflecting those of the primary tumor types, although with high rates of whole genome duplication events (56%). Metastatic lesions are relatively homogeneous with the vast majority (96%) of driver mutations being clonal and up to 80% of tumor suppressor genes bi-allelically inactivated through different mutational mechanisms. For 62% of all patients, genetic variants that may be associated with outcome of approved or experimental therapies were detected. These actionable events were distributed across various mutation types underlining the importance of comprehensive genomic tumor profiling for cancer precision medicine.
biorxiv cancer-biology 100-200-users 2018The landscape of somatic mutation in normal colorectal epithelial cells, bioRxiv, 2018-09-14
AbstractThe colorectal adenoma-carcinoma sequence has provided a paradigmatic framework for understanding the successive somatic genetic changes and consequent clonal expansions leading to cancer. As for most cancer types, however, understanding of the earliest phases of colorectal neoplastic change, which may occur in morphologically normal tissue, is comparatively limited because of the difficulty of detecting somatic mutations in normal cells. Each colorectal crypt is a small clone of cells derived from a single recently-existing stem cell. Here, we whole genome sequenced hundreds of normal crypts from 42 individuals. Signatures of multiple mutational processes were revealed, some ubiquitous and continuous, others only found in some individuals, in some crypts or during some phases of the cell lineage from zygote to adult cell. Likely driver mutations were present in ∼1% of normal colorectal crypts in middle-aged individuals, indicating that adenomas and carcinomas are rare outcomes of a pervasive process of neoplastic change across morphologically normal colorectal epithelium.
biorxiv cancer-biology 0-100-users 2018Integrated computational and experimental identification of p53, KRAS and VHL mutant selection associated with CRISPR-Cas9 editing, bioRxiv, 2018-09-04
AbstractRecent studies have reported that CRISPR-Cas9 gene editing induces a p53-dependent DNA damage response in primary cells, which may select for cells with oncogenic p53 mutations11,12. It is unclear whether these CRISPR-induced changes are applicable to different cell types, and whether CRISPR gene editing may select for other oncogenic mutations. Addressing these questions, we analyzed genome-wide CRISPR and RNAi screens to systematically chart the mutation selection potential of CRISPR knockouts across the whole exome. Our analysis suggests that CRISPR gene editing can select for mutants of KRAS and VHL, at a level comparable to that reported for p53. These predictions were further validated in a genome-wide manner by analyzing independent CRISPR screens and patients’ tumor data. Finally, we performed a new set of pooled and arrayed CRISPR screens to evaluate the competition between CRISPR-edited isogenic p53 WT and mutant cell lines, which further validated our predictions. In summary, our study systematically charts and points to the potential selection of specific cancer driver mutations during CRISPR-Cas9 gene editing.
biorxiv cancer-biology 200-500-users 2018The Repertoire of Mutational Signatures in Human Cancer, bioRxiv, 2018-05-15
ABSTRACTSomatic mutations in cancer genomes are caused by multiple mutational processes each of which generates a characteristic mutational signature. Using 84,729,690 somatic mutations from 4,645 whole cancer genome and 19,184 exome sequences encompassing most cancer types we characterised 49 single base substitution, 11 doublet base substitution, four clustered base substitution, and 17 small insertion and deletion mutational signatures. The substantial dataset size compared to previous analyses enabled discovery of new signatures, separation of overlapping signatures and decomposition of signatures into components that may represent associated, but distinct, DNA damage, repair andor replication mechanisms. Estimation of the contribution of each signature to the mutational catalogues of individual cancer genomes revealed associations with exogenous and endogenous exposures and defective DNA maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes contributing to the development of human cancer including a comprehensive reference set of mutational signatures in human cancer.
biorxiv cancer-biology 100-200-users 2018