Revealing multi-scale population structure in large cohorts, bioRxiv, 2018-09-23
Genetic structure in large cohorts results from technical, sampling and demographic variation. Visualisation is therefore a first step in most genomic analyses. However, existing data exploration methods struggle with unbalanced sampling and the many scales of population structure. We investigate an approach to dimension reduction of genomic data that combines principal components analysis (PCA) with uniform manifold approximation and projection (UMAP) to succinctly illustrate population structure in large cohorts and capture their relationships on local and global scales. Using data from large-scale genomic datasets, we demonstrate that PCA-UMAP effectively clusters closely related individuals while placing them in a global continuum of genetic variation. This approach reveals previously overlooked subpopulations within the American Hispanic population and fine-scale relationships between geography, genotypes, and phenotypes in the UK population. This opens new lines of investigation for demographic research and statistical genetics. Given its small computational cost, PCA-UMAP also provides a general-purpose approach to exploratory analysis in population-scale datasets.
biorxiv genomics 100-200-users 2018A Multi-Domain Task Battery Reveals Functional Boundaries in the Human Cerebellum, bioRxiv, 2018-09-21
AbstractThere is compelling evidence that the human cerebellum is engaged in a wide array of motor and cognitive tasks. A fundamental question centers on whether the cerebellum is organized into distinct functional sub-regions. To address this question, we employed a rich task battery, designed to tap into a broad range of cognitive processes. During four functional magnetic resonance imaging (fMRI) sessions, participants performed a battery of 26 diverse tasks comprising 47 unique conditions. Using the data from this multi-domain task battery (MDTB), we derived a comprehensive functional parcellation of the cerebellar cortex and evaluated it by predicting functional boundaries in a novel set of tasks. The new parcellation successfully identified distinct functional sub-regions, providing significant improvements over existing parcellations derived from task-free data. Lobular boundaries, commonly used to summarize functional data, did not coincide with functional subdivisions. This multi-domain task approach offers novel insights into the functional heterogeneity of the cerebellar cortex.
biorxiv neuroscience 100-200-users 2018Objective versus Self-Reported Energy Intake Changes During Low-Carbohydrate and Low-Fat Diets, bioRxiv, 2018-09-21
AbstractObjectiveTo examine objective versus self-reported energy intake changes (ΔEI) during a 12-month diet intervention.MethodsWe calculated ΔEI in subjects who participated in a 1-year randomized low-carbohydrate versus low-fat diet trial using repeated body weight measurements as inputs to an objective mathematical model (ΔEIModel) and compared these values with self-reported energy intake changes assessed by repeated 24-hr recalls (ΔEI24hrRecall).ResultsΔEI24hrRecall indicated a relatively persistent state of calorie restriction ≥500 kcald throughout the year with no significant differences between diets. ΔEIModel demonstrated large early decreases in calorie intake >800 kcald followed by an exponential return to approximately 100 kcald below baseline at the end of the year. The low-carbohydrate diet resulted in ΔEIModel that was 162±53 kcald lower than the low-fat diet over the first 3 months (p=0.002), but no significant diet differences were found at later times. Weight loss at 12 months was significantly related to ΔEIModel at all time intervals for both diets (p<0.0001).ConclusionsSelf-reported measurements of ΔEI were inaccurate. Model-based calculations of ΔEI found that instructions to follow the low-carbohydrate diet resulted in greater calorie restriction than the low-fat diet in the early phases of the intervention, but these diet differences were not sustained.What is already known about this subject?<jatslist list-type=bullet><jatslist-item>Diet assessments that rely on self-report, such as 24hr dietary recall, are known to underestimate actual energy intake as measured by doubly labeled water. However, it is possible that repeated self-reported measurements could accurately detect changes in energy intake over time if the absolute bias of self-reported of measurements is approximately constant for each subject.<jatslist-item>What this study adds<jatslist list-type=bullet><jatslist-item>We compared energy intake changes measured using repeated 24hr dietary recall measurements collected over the course of the 1-year Diet Intervention Examining The Factors Interacting with Treatment Success (DIETFITS) trial versus energy intake changes calculated using repeated body weight measurements as inputs to a validated mathematical model.<jatslist-item><jatslist-item>Whereas self-reported measurements indicated a relatively persistent state of calorie restriction, objective model-based measurements demonstrated a large early calorie restriction followed by an exponential rise in energy intake towards the pre-intervention baseline.<jatslist-item><jatslist-item>Model-based calculations, but not self-reported measurements, found that low-carbohydrate diets led to significantly greater early decreases in energy intake compared to low-fat diets, but long-term energy intake changes were not significantly different.<jatslist-item>
biorxiv physiology 100-200-users 2018Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry, bioRxiv, 2018-09-21
AbstractThe earliest ancient DNA data of modern humans from Europe dates to ∼40 thousand years ago1-4, but that from the Caucasus and the Near East to only ∼14 thousand years ago5,6, from populations who lived long after the Last Glacial Maximum (LGM) ∼26.5-19 thousand years ago7. To address this imbalance and to better understand the relationship of Europeans and Near Easterners, we report genome-wide data from two ∼26 thousand year old individuals from Dzudzuana Cave in Georgia in the Caucasus from around the beginning of the LGM. Surprisingly, the Dzudzuana population was more closely related to early agriculturalists from western Anatolia ∼8 thousand years ago8 than to the hunter-gatherers of the Caucasus from the same region of western Georgia of ∼13-10 thousand years ago5. Most of the Dzudzuana population’s ancestry was deeply related to the post-glacial western European hunter-gatherers of the ‘Villabruna cluster’3, but it also had ancestry from a lineage that had separated from the great majority of non-African populations before they separated from each other, proving that such ‘Basal Eurasians’6,9 were present in West Eurasia twice as early as previously recorded5,6. We document major population turnover in the Near East after the time of Dzudzuana, showing that the highly differentiated Holocene populations of the region6 were formed by ‘Ancient North Eurasian’3,9,10 admixture into the Caucasus and Iran and North African11,12 admixture into the Natufians of the Levant. We finally show that the Dzudzuana population contributed the majority of the ancestry of post-Ice Age people in the Near East, North Africa, and even parts of Europe, thereby becoming the largest single contributor of ancestry of all present-day West Eurasians.
biorxiv genetics 100-200-users 2018Pan-cancer whole genome analyses of metastatic solid tumors, bioRxiv, 2018-09-21
AbstractMetastatic cancer is one of the major causes of death and is associated with poor treatment efficiency. A better understanding of the characteristics of late stage cancer is required to help tailor personalised treatment, reduce overtreatment and improve outcomes. Here we describe the largest pan-cancer study of metastatic solid tumor genomes, including 2,520 whole genome-sequenced tumor-normal pairs, analyzed at a median depth of 106x and 38x respectively, and surveying over 70 million somatic variants. Metastatic lesions were found to be very diverse, with mutation characteristics reflecting those of the primary tumor types, although with high rates of whole genome duplication events (56%). Metastatic lesions are relatively homogeneous with the vast majority (96%) of driver mutations being clonal and up to 80% of tumor suppressor genes bi-allelically inactivated through different mutational mechanisms. For 62% of all patients, genetic variants that may be associated with outcome of approved or experimental therapies were detected. These actionable events were distributed across various mutation types underlining the importance of comprehensive genomic tumor profiling for cancer precision medicine.
biorxiv cancer-biology 100-200-users 2018BRICseq bridges brain-wide interregional connectivity to neural activity and gene expression in single animals, bioRxiv, 2018-09-20
SummaryComprehensive analysis of neuronal networks requires brain-wide measurement of connectivity, activity, and gene expression. Although high-throughput methods are available for mapping brain-wide activity and transcriptomes, comparable methods for mapping region-to-region connectivity remain slow and expensive because they require averaging across hundreds of brains. Here we describe BRICseq, which leverages DNA barcoding and sequencing to map connectivity from single individuals in a few weeks and at low cost. Applying BRICseq to the mouse neocortex, we find that region-to-region connectivity provides a simple bridge relating transcriptome to activity The spatial expression patterns of a few genes predict region-to-region connectivity, and connectivity predicts activity correlations. We also exploited BRICseq to map the mutant BTBR mouse brain, which lacks a corpus callosum, and recapitulated its known connectopathies. BRICseq allows individual laboratories to compare how age, sex, environment, genetics and species affect neuronal wiring, and to integrate these with functional activity and gene expression.
biorxiv neuroscience 100-200-users 2018