Mutation detection in thousands of acute myeloid leukemia cells using single cell RNA-sequencing, bioRxiv, 2018-10-18

AbstractVirtually all tumors are genetically heterogeneous, containing subclonal populations of cells that are defined by distinct mutations1. Subclones can have unique phenotypes that influence disease progression2, but these phenotypes are difficult to characterize subclones usually cannot be physically purified, and bulk gene expression measurements obscure interclonal differences. Single-cell RNA-sequencing has revealed transcriptional heterogeneity within a variety of tumor types, but it is unclear how this expression heterogeneity relates to subclonal genetic events – for example, whether particular expression clusters correspond to mutationally defined subclones3,4,5,6-9. To address this question, we developed an approach that integrates enhanced whole genome sequencing (eWGS) with the 10x Genomics Chromium Single Cell 5’ Gene Expression workflow (scRNA-seq) to directly link expressed mutations with transcriptional profiles at single cell resolution. Using bone marrow samples from five cases of primary human Acute Myeloid Leukemia (AML), we generated WGS and scRNA-seq data for each case. Duplicate single cell libraries representing a median of 20,474 cells per case were generated from the bone marrow of each patient. Although the libraries were 5’ biased, we detected expressed mutations in cDNAs at distances up to 10 kbp from the 5’ ends of well-expressed genes, allowing us to identify hundreds to thousands of cells with AML-specific somatic mutations in every case. This data made it possible to distinguish AML cells (including normal-karyotype AML cells) from surrounding normal cells, to study tumor differentiation and intratumoral expression heterogeneity, to identify expression signatures associated with subclonal mutations, and to find cell surface markers that could be used to purify subclones for further study. The data also revealed transcriptional heterogeneity that occurred independently of subclonal mutations, suggesting that additional factors drive epigenetic heterogeneity. This integrative approach for connecting genotype to phenotype in AML cells is broadly applicable for analysis of any sample that is phenotypically and genetically heterogeneous.

biorxiv cancer-biology 100-200-users 2018

A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data, bioRxiv, 2018-10-17

AbstractGenome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a ‘glocal’ alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxinrimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.

biorxiv bioinformatics 100-200-users 2018

High throughput droplet single-cell Genotyping of Transcriptomes (GoT) reveals the cell identity dependency of the impact of somatic mutations, bioRxiv, 2018-10-17

AbstractDefining the transcriptomic identity of clonally related malignant cells is challenging in the absence of cell surface markers that distinguish cancer clones from one another or from admixed non-neoplastic cells. While single-cell methods have been devised to capture both the transcriptome and genotype, these methods are not compatible with droplet-based single-cell transcriptomics, limiting their throughput. To overcome this limitation, we present single-cell Genotyping of Transcriptomes (GoT), which integrates cDNA genotyping with high-throughput droplet-based single-cell RNA-seq. We further demonstrate that multiplexed GoT can interrogate multiple genotypes for distinguishing subclonal transcriptomic identity. We apply GoT to 26,039 CD34+ cells across six patients with myeloid neoplasms, in which the complex process of hematopoiesis is corrupted by CALR-mutated stem and progenitor cells. We define high-resolution maps of malignant versus normal hematopoietic progenitors, and show that while mutant cells are comingled with wildtype cells throughout the hematopoietic progenitor landscape, their frequency increases with differentiation. We identify the unfolded protein response as a predominant outcome of CALR mutations, with significant cell identity dependency. Furthermore, we identify that CALR mutations lead to NF-κB pathway upregulation specifically in uncommitted early stem cells. Collectively, GoT provides high-throughput linkage of single-cell genotypes with transcriptomes and reveals that the transcriptional output of somatic mutations is heavily dependent on the native cell identity.

biorxiv cancer-biology 0-100-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo