A deep learning system can accurately classify primary and metastatic cancers based on patterns of passenger mutations, bioRxiv, 2017-11-06
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of the time a cancer patient presents with metastatic tumour and no obvious primary. Challenges also arise when distinguishing a metastatic recurrence of a previously treated cancer from the emergence of a new one. Here we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types. Our classifier achieves an accuracy of 91% on held-out tumor samples and 82% and 85% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced classifier accuracy. Our results have immediate clinical applicability, underscoring how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of cell-free circulating tumour DNA.
biorxiv cancer-biology 100-200-users 2017A theory of multineuronal dimensionality, dynamics and measurement, bioRxiv, 2017-11-06
AbstractIn many experiments, neuroscientists tightly control behavior, record many trials, and obtain trial-averaged firing rates from hundreds of neurons in circuits containing billions of behaviorally relevant neurons. Di-mensionality reduction methods reveal a striking simplicity underlying such multi-neuronal data they can be reduced to a low-dimensional space, and the resulting neural trajectories in this space yield a remarkably insightful dynamical portrait of circuit computation. This simplicity raises profound and timely conceptual questions. What are its origins and its implications for the complexity of neural dynamics? How would the situation change if we recorded more neurons? When, if at all, can we trust dynamical portraits obtained from measuring an infinitesimal fraction of task relevant neurons? We present a theory that answers these questions, and test it using physiological recordings from reaching monkeys. This theory reveals conceptual insights into how task complexity governs both neural dimensionality and accurate recovery of dynamic portraits, thereby providing quantitative guidelines for future large-scale experimental design.
biorxiv neuroscience 100-200-users 2017Whole-genome sequencing analysis of copy number variation (CNV) using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis, bioRxiv, 2017-11-05
ABSTRACTBackgroundCNV analysis is an integral component to the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.MethodsWe compared the CNV detection capabilities of WGS strategies (short-insert, 3kb-, and 5kb-insert mate-pair) each at 1x, 3x, and 5x coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of Gold Standard (GS) CNVs generated for the 1000-Genomes-Project CEU subject NA12878.ResultsOverall, low-coverage WGS strategies detect drastically more GS CNVs compared to arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1x coverage) is able to detect all seven GS deletion-CNVs >100 kb in NA12878 whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri-du-chat deletion can be readily detected with short-insert paired-end WGS at even just 1x coverage.ConclusionsCNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.
biorxiv genomics 100-200-users 2017Evolutionary dynamics of bacteria in the gut microbiome within and across hosts, bioRxiv, 2017-10-31
AbstractGut microbiota are shaped by a combination of ecological and evolutionary forces. While the ecological dynamics have been extensively studied, much less is known about how species of gut bacteria evolve over time. Here we introduce a model-based framework for quantifying evolutionary dynamics within and across hosts using a panel of metagenomic samples. We use this approach to study evolution in ∼30 prevalent species in the human gut. Although the patterns of between-host diversity are consistent with quasi-sexual evolution and purifying selection on long timescales, we identify new genealogical signatures that challenge standard population genetic models of these processes. Within hosts, we find that genetic differences that accumulate over ∼6 month timescales are only rarely attributable to replacement by distantly related strains. Instead, the resident strains more commonly acquire a smaller number of putative evolutionary changes, in which nucleotide variants or gene gains or losses rapidly sweep to high frequency. By comparing these mutations with the typical between-host differences, we find evidence that some sweeps are seeded by recombination, in addition to new mutations. However, comparisons of adult twins suggest that replacement eventually overwhelms evolution over multi-decade timescales, hinting at fundamental limits to the extent of local adaptation. Together, our results suggest that gut bacteria can evolve on human-relevant timescales, and they highlight the connections between these short-term evolutionary dynamics and longer-term evolution across hosts.
biorxiv evolutionary-biology 100-200-users 2017Transcriptome-wide association studies opportunities and challenges, bioRxiv, 2017-10-23
Transcriptome-wide association studies (TWAS) integrate GWAS and gene expression datasets to find gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes, using simulations and case studies of literature-curated candidate causal genes for schizophrenia, LDL cholesterol and Crohn’s disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene, as well as loci where TWAS prioritizes multiple genes, some of which are unlikely to be causal, because they share the same variants as eQTLs. We illustrate that TWAS is especially prone to spurious prioritization when using expression data from tissues or cell types that are less related to the trait, due to substantial variation in both expression levels and eQTL strengths across cell types. Nonetheless, TWAS prioritizes candidate causal genes at GWAS loci more accurately than simple baselines based on proximity to lead GWAS variant and expression in trait-related tissue. We discuss current strategies and future opportunities for improving the performance of TWAS for causal gene prioritization. Our results showcase the strengths and limitations of using expression variation across individuals to determine causal genes at GWAS loci and provide guidelines and best practices when using TWAS to prioritize candidate causal genes.
biorxiv genetics 100-200-users 2017RNA velocity in single cells, bioRxiv, 2017-10-20
AbstractRNA abundance is a powerful indicator of the state of individual cells, but does not directly reveal dynamic processes such as cellular differentiation. Here we show that RNA velocity—the time derivative of RNA abundance—can be estimated by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. We show that RNA velocity is a vector that predicts the future state of individual cells on a timescale of hours. We validate the accuracy of RNA velocity in the neural crest lineage, demonstrate its use on multiple technical platforms, reconstruct the branching lineage tree of the mouse hippocampus, and measure RNA kinetics in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.
biorxiv genomics 100-200-users 2017