The genome of C57BL6J Eve, the mother of the laboratory mouse genome reference strain, bioRxiv, 2019-01-12
Isogenic laboratory mouse strains are used to enhance reproducibility as individuals within a strain are essentially genetically identical. For the most widely used isogenic strain, C57BL6, there is also a wealth of genetic, phenotypic, and genomic data, including one of the highest quality reference genomes (GRCm38.p6). However, laboratory mouse strains are living reagents and hence genetic drift occurs and is an unavoidable source of accumulating genetic variability that can have an impact on reproducibility over time. Nearly 20 years after the first release of the mouse reference genome, individuals from the strain it represents (C57BL6J) are at least 26 inbreeding generations removed from the individuals used to generate the mouse reference genome. Moreover, C57BL6J is now maintained through the periodic reintroduction of mice from cryopreserved embryo stocks that are derived from a single breeder pair, aptly named C57BL6J Adam and Eve. To more accurately represent the genome of today's C57BL6J mice, we have generated a de novo assembly of the C57BL6J Eve genome (B6Eve) using high coverage, long-read sequencing, optical mapping, and short-read data. Using these data, we addressed recurring variants observed in previous mouse studies. We have also identified structural variations that impact coding sequences, closed gaps in the mouse reference assembly, some of which are in genes, and we have identified previously unannotated coding sequences through long read sequencing of cDNAs. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and has provided data towards a reference genome that is more representative of the C57BL6J mice that are in use today.
biorxiv genomics 0-100-users 2019Tumor mutational landscape is a record of the pre-malignant state, bioRxiv, 2019-01-12
Chromatin structure has a major influence on the cell-specific density of somatic mutations along the cancer genome. Here, we present a pan-cancer study in which we searched for the putative cancer cell-of-origin of 2,550 whole genomes, representing 32 cancer types by matching their mutational landscape to the regional patterns of chromatin modifications ascertained in 104 normal tissue types. We found that, in almost all cancer types, the cell-of-origin can be predicted solely from their DNA sequences. Our analysis validated the hypothesis that high-grade serous ovarian cancer originates in the fallopian tube and identified distinct origins of breast cancer subtypes. We also demonstrated that the technique is equally capable of identifying the cell-of-origin for a series of 2,044 metastatic samples from 22 of the tumor types available as primaries. Moreover, cancer drivers, whether inherited or acquired, reside in active chromatin regions in the respective cell-of-origin. Taken together, our findings highlight that many somatic mutations accumulate while the chromatin structure of the cell-of-origin is maintained and that this historical record, captured in the DNA, can be used to identify the often elusive cancer cell-of-origin.
biorxiv genomics 100-200-users 2019Whole-genome sequencing of rare disease patients in a national healthcare system, bioRxiv, 2019-01-02
Most patients with hereditary rare diseases do not receive a molecular diagnosis and the aetiological variants and mediating genes for half such disorders remain to be discovered. We implemented whole-genome sequencing (WGS) in a national healthcare system to streamline diagnosis and to discover unknown aetiological variants, in the coding and non-coding regions of the genome. In a pilot study for the 100,000 Genomes Project, we generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,040 of the 7,065 patients with detailed phenotypic data. We identified 99 Mendelian associations between genes and rare diseases, of which at least 80 are confirmed aetiological. Using WGS of UK Biobank, we showed that rare alleles can explain the presence of some individuals in the tails of a quantitative red blood cell (RBC) trait 1. Finally, we reported novel non-coding variants which cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.
biorxiv genomics 200-500-users 2019Single-cell multi-omic profiling of chromatin conformation and DNA methylome, bioRxiv, 2018-12-27
AbstractRecent advances in the development of single cell epigenomic assays have facilitated the analysis of gene regulatory landscapes in complex biological systems. Methods for detection of single-cell epigenomic variation such as DNA methylation sequencing and ATAC-seq hold tremendous promise for delineating distinct cell types and identifying their critical cis-regulatory sequences. Emerging evidence has shown that in addition to cis-regulatory sequences, dynamic regulation of 3D chromatin conformation is a critical mechanism for the modulation of gene expression during development and disease. It remains unclear whether single-cell Chromatin Conformation Capture (3C) or Hi-C profiles are suitable for cell type identification and allow the reconstruction of cell-type specific chromatin conformation maps. To address these challenges, we have developed a multi-omic method single-nucleus methyl-3C sequencing (sn-m3C-seq) to profile chromatin conformation and DNA methylation from the same cell. We have shown that bulk m3C-seq and sn-m3C-seq accurately capture chromatin organization information and robustly separate mouse cell types. We have developed a fluorescent-activated nuclei sorting strategy based on DNA content that eliminates nuclei multiplets caused by crosslinking. The sn-m3C-seq method allows high-resolution cell-type classification using two orthogonal types of epigenomic information and the reconstruction of cell-type specific chromatin conformation maps.
biorxiv genomics 100-200-users 2018Precision Medicine Advancements Using Whole Genome Sequencing, Noninvasive Whole Body Imaging, and Functional Diagnostics, bioRxiv, 2018-12-18
ABSTRACTWe report the results of a three-year precision medicine study that enrolled 1190 presumed healthy participants at a single research clinic. To enable a better assessment of disease risk and improve diagnosis, a precision health platform that integrates non-invasive functional measurements and clinical tests combined with whole genome sequencing (WGS) was developed. The platform included WGS, comprehensive quantitative non-contrast whole body (WB) and brain magnetic resonance imagingangiography (MRIMRA), computed tomography (CT) coronary artery calcium scoring, electrocardiogram, echocardiogram, continuous cardiac monitoring, clinical laboratory tests, and metabolomics. In our cohort, 24.3% had medically significant genetic findings (MSF) which may contribute to increased risk of disease. A total of 206 unique medically significant variants in 111 genes were identified, and forty individuals (3.4%) had more than one MSF. Phenotypic testing revealed 34.2% of our cohort had a metabolomics profile suggestive of insulin resistance, 29.2% had elevated liver fat identified by MRI, 16.4% had clinically important cardiac structure or cardiac function abnormalities on cardiac MRI or ECHO, 8.8% had a high cardiovascular risk on CT coronary artery calcium scoring (Agatston calcium score > 400, Relative Risk of 7.2), 8.0% had arrhythmia found on continuous rhythm monitoring, 6.5% had cardiac conduction disorders found on EKG, 2% had previously undetected tumors detected by WB MRI, and 2.5% had previously undetected aneurysms detected by non-contrast MRIMRA. Using family histories, personal histories, and test results, clinical and phenotypic findings were correlated with genomic findings in 130 study participants (63.1%) with high to moderate penetrance variants, suggesting the precision health platform improves the diagnostic process in asymptomatic individuals who were at risk. Cardiovascular and endocrine diseases achieved considerable clinical associations between MSFs and clinical phenotypes (89% and 72%, respectively). These findings demonstrate the value of integrating WGS and noninvasive clinical assessments for a rapid and integrated point-of-care clinical diagnosis of age-related diseases that contribute to premature mortality.
biorxiv genomics 0-100-users 2018Population structure of modern-day Italians reveals patterns of ancient and archaic ancestries in Southern Europe, bioRxiv, 2018-12-14
European populations display low genetic diversity as the result of long term blending of the small number of ancient founding ancestries. However it is still unclear how the combination of ancient ancestries related to early European foragers, Neolithic farmers and Bronze Age nomadic pastoralists can fully explain genetic variation across Europe. Populations in natural crossroads like the Italian peninsula are expected to recapitulate the overall continental diversity, but to date have been systematically understudied. Here we characterised the ancestry profiles of modern-day Italian populations using a genome-wide dataset representative of modern and ancient samples from across Italy, Europe and the rest of the world. Italian genomes captured several ancient signatures, including a non-steppe related substantial ancestry contribution ultimately from the Caucasus. Differences in ancestry composition as the result of migration and admixture generated in Italy the largest degree of population structure detected so far in the continent and shaped the amount of Neanderthal DNA present in modern-day populations.
biorxiv genomics 0-100-users 2018