Multi-Omics factor analysis - a framework for unsupervised integration of multi-omic data sets, bioRxiv, 2017-11-14
AbstractMulti-omic studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous datasets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omic datasets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation, and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex-vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multiomics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
biorxiv bioinformatics 100-200-users 2017Resting-state functional brain connectivity best predicts the personality dimension of openness to experience, bioRxiv, 2017-11-14
AbstractPersonality neuroscience aims to find associations between brain measures and personality traits. Findings to date have been severely limited by a number of factors, including small sample size and omission of out-of-sample prediction. We capitalized on the recent availability of a large database, together with the emergence of specific criteria for best practices in neuroimaging studies of individual differences. We analyzed resting-state functional magnetic resonance imaging data from 884 young healthy adults in the Human Connectome Project (HCP) database. We attempted to predict personality traits from the “Big Five”, as assessed with the NEO-FFI test, using individual functional connectivity matrices. After regressing out potential confounds (such as age, sex, handedness and fluid intelligence), we used a cross-validated framework, together with test-retest replication (across two sessions of resting-state fMRI for each subject), to quantify how well the neuroimaging data could predict each of the five personality factors. We tested three different (published) denoising strategies for the fMRI data, two inter-subject alignment and brain parcellation schemes, and three different linear models for prediction. As measurement noise is known to moderate statistical relationships, we performed final prediction analyses using average connectivity across both imaging sessions (1 h of data), with the analysis pipeline that yielded the highest predictability overall. Across all results (testretest; 3 denoising strategies; 2 alignment schemes; 3 models), Openness to experience emerged as the only reliably predicted personality factor. Using the full hour of resting-state data and the best pipeline, we could predict Openness to experience (NEOFAC_O r=0.24, R2=0.024) almost as well as we could predict the score on a 24-item intelligence test (PMAT24_A_CR r=0.26, R2=0.044). Other factors (Extraversion, Neuroticism, Agreeableness and Conscientiousness) yielded weaker predictions across results that were not statistically significant under permutation testing. We also derived two superordinate personality factors (“α” and “β”) from a principal components analysis of the NEO-FFI factor scores, thereby reducing noise and enhancing the precision of these measures of personality. We could account for 5% of the variance in the β superordinate factor (r=0.27, R2=0.050), which loads highly on Openness to experience. We conclude with a discussion of the potential for predicting personality from neuroimaging data and make specific recommendations for the field.
biorxiv neuroscience 100-200-users 2017Theta and alpha oscillations are traveling waves in the human neocortex, bioRxiv, 2017-11-14
SummaryHuman cognition requires the coordination of neural activity across widespread brain networks. Here we describe a new mechanism for large-scale coordination in the human brain traveling waves of theta and alpha oscillations. Examining direct brain recordings from neurosurgical patients performing a memory task, we found contiguous clusters of cortex in individual patients with oscillations at specific frequencies between 2 to 15 Hz. These clusters displayed spatial phase gradients, indicating that the oscillations were traveling waves that propagated across the cortex at ∼0.25-0.75 ms. Traveling waves were relevant behaviorally because their propagation correlated with task events and was more consistent when subjects performed the task well. Our findings suggest that traveling waves can be modeled by a network of coupled oscillators because the direction of wave propagation correlated with the spatial orientation of local frequency gradients. These findings suggest a role for traveling waves in supporting brain connectivity by organizing neural processes across space and time.
biorxiv neuroscience 100-200-users 2017A deep learning system can accurately classify primary and metastatic cancers based on patterns of passenger mutations, bioRxiv, 2017-11-06
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of the time a cancer patient presents with metastatic tumour and no obvious primary. Challenges also arise when distinguishing a metastatic recurrence of a previously treated cancer from the emergence of a new one. Here we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types. Our classifier achieves an accuracy of 91% on held-out tumor samples and 82% and 85% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced classifier accuracy. Our results have immediate clinical applicability, underscoring how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of cell-free circulating tumour DNA.
biorxiv cancer-biology 100-200-users 2017A theory of multineuronal dimensionality, dynamics and measurement, bioRxiv, 2017-11-06
AbstractIn many experiments, neuroscientists tightly control behavior, record many trials, and obtain trial-averaged firing rates from hundreds of neurons in circuits containing billions of behaviorally relevant neurons. Di-mensionality reduction methods reveal a striking simplicity underlying such multi-neuronal data they can be reduced to a low-dimensional space, and the resulting neural trajectories in this space yield a remarkably insightful dynamical portrait of circuit computation. This simplicity raises profound and timely conceptual questions. What are its origins and its implications for the complexity of neural dynamics? How would the situation change if we recorded more neurons? When, if at all, can we trust dynamical portraits obtained from measuring an infinitesimal fraction of task relevant neurons? We present a theory that answers these questions, and test it using physiological recordings from reaching monkeys. This theory reveals conceptual insights into how task complexity governs both neural dimensionality and accurate recovery of dynamic portraits, thereby providing quantitative guidelines for future large-scale experimental design.
biorxiv neuroscience 100-200-users 2017Whole-genome sequencing analysis of copy number variation (CNV) using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis, bioRxiv, 2017-11-05
ABSTRACTBackgroundCNV analysis is an integral component to the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.MethodsWe compared the CNV detection capabilities of WGS strategies (short-insert, 3kb-, and 5kb-insert mate-pair) each at 1x, 3x, and 5x coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of Gold Standard (GS) CNVs generated for the 1000-Genomes-Project CEU subject NA12878.ResultsOverall, low-coverage WGS strategies detect drastically more GS CNVs compared to arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1x coverage) is able to detect all seven GS deletion-CNVs >100 kb in NA12878 whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri-du-chat deletion can be readily detected with short-insert paired-end WGS at even just 1x coverage.ConclusionsCNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.
biorxiv genomics 100-200-users 2017