Single-cell Map of Diverse Immune Phenotypes Driven by the Tumor Microenvironment, bioRxiv, 2017-11-26
SUMMARYKnowledge of immune cell phenotypes in the tumor microenvironment is essential for understanding mechanisms of cancer progression and immunotherapy response. We created an immune map of breast cancer using single-cell RNA-seq data from 45,000 immune cells from eight breast carcinomas, as well as matched normal breast tissue, blood, and lymph node. We developed a preprocessing pipeline, SEQC, and a Bayesian clustering and normalization method, Biscuit, to address computational challenges inherent to single-cell data. Despite significant similarity between normal and tumor tissue-resident immune cells, we observed continuous tumor-specific phenotypic expansions driven by environmental cues. Analysis of paired single-cell RNA and T cell receptor (TCR) sequencing data from 27,000 additional T cells revealed the combinatorial impact of TCR utilization on phenotypic diversity. Our results support a model of continuous activation in T cells and do not comport with the macrophage polarization model in cancer, with important implications for characterizing tumor-infiltrating immune cells.
biorxiv immunology 100-200-users 2017New synthetic-diploid benchmark for accurate variant calling evaluation, bioRxiv, 2017-11-23
Constructed from the consensus of multiple variant callers based on short-read data, existing benchmark datasets for evaluating variant calling accuracy are biased toward easy regions accessible by known algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two human cell lines that are homozygous across the whole genome. This benchmark provides a more accurate and less biased estimate of the error rate of small variant calls in a realistic context.
biorxiv bioinformatics 100-200-users 2017Recovery of gene haplotypes from a metagenome, bioRxiv, 2017-11-23
AbstractElucidation of population-level diversity of microbiomes is a significant step towards a complete understanding of the evolutionary, ecological and functional importance of microbial communities. Characterizing this diversity requires the recovery of the exact DNA sequence (haplotype) of each gene isoform from every individual present in the community. To address this, we present Hansel and Gretel a freely-available data structure and algorithm, providing a software package that reconstructs the most likely haplotypes from metagenomes. We demonstrate recovery of haplotypes from short-read Illumina data for a bovine rumen microbiome, and verify our predictions are 100% accurate with long-read PacBio CCS sequencing. We show that Gretel’s haplotypes can be analyzed to determine a significant difference in mutation rates between core and accessory gene families in an ovine rumen microbiome. All tools, documentation and data for evaluation are open source and available via our repository <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comsamstudio8gretel>httpsgithub.comsamstudio8gretel<jatsext-link>
biorxiv bioinformatics 100-200-users 2017Current CRISPR gene drive systems are likely to be highly invasive in wild populations, bioRxiv, 2017-11-21
AbstractRecent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
biorxiv synthetic-biology 100-200-users 2017Genome-wide polygenic score to identify a monogenic risk-equivalent for coronary disease, bioRxiv, 2017-11-21
AbstractIdentification of individuals at increased genetic risk for a complex disorder such as coronary disease can facilitate treatments or enhanced screening strategies. A rare monogenic mutation associated with increased cholesterol is present in ~1250 carriers and confers an up to 4-fold increase in coronary risk when compared with non-carriers. Although individual common polymorphisms have modest predictive capacity, their cumulative impact can be aggregated into a polygenic score. Here, we develop a new, genome-wide polygenic score that aggregates information from 6.6 million common polymorphisms and show that this score can similarly identify individuals with a 4-fold increased risk for coronary disease. In >400,000 participants from UK Biobank, the score conforms to a normal distribution and those in the top 2.5% of the distribution are at 4-fold increased risk compared to the remaining 97.5%. Similar patterns are observed with genome-wide polygenic scores for two additional diseases – breast cancer and severe obesity.One Sentence SummaryA genome-wide polygenic score identifies 2.5% of the population born with a 4-fold increased risk for coronary artery disease.
biorxiv genomics 100-200-users 2017Higher-order inter-chromosomal hubs shape 3-dimensional genome organization in the nucleus, bioRxiv, 2017-11-19
ABSTRACTEukaryotic genomes are packaged into a 3-dimensional structure in the nucleus of each cell. There are currently two distinct views of genome organization that are derived from different technologies. The first view, derived from genome-wide proximity ligation methods (e.g. Hi-C), suggests that genome organization is largely organized around chromosomes. The second view, derived from in situ imaging, suggests a central role for nuclear bodies. Yet, because microscopy and proximity-ligation methods measure different aspects of genome organization, these two views remain poorly reconciled and our overall understanding of how genomic DNA is organized within the nucleus remains incomplete. Here, we develop Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which moves away from proximity-ligation and enables genome-wide detection of higher-order DNA interactions within the nucleus. Using SPRITE, we recapitulate known genome structures identified by Hi-C and show that the contact frequencies measured by SPRITE strongly correlate with the 3-dimensional distances measured by microscopy. In addition to known structures, SPRITE identifies two major hubs of inter-chromosomal interactions that are spatially arranged around the nucleolus and nuclear speckles, respectively. We find that the majority of genomic regions exhibit preferential spatial association relative to one of these nuclear bodies, with regions that are highly transcribed by RNA Polymerase II organizing around nuclear speckles and transcriptionally inactive and centromere-proximal regions organizing around the nucleolus. Together, our results reconcile the two distinct pictures of nuclear structure and demonstrate that nuclear bodies act as inter-chromosomal hubs that shape the overall 3-dimensional packaging of genomic DNA in the nucleus.
biorxiv genomics 100-200-users 2017