Dating genomic variants and shared ancestry in population-scale sequencing data, bioRxiv, 2018-09-14
AbstractThe origin and fate of new mutations within species is the fundamental process underlying evolution. However, while previous efforts have been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a non-parametric approach for estimating the date of origin of genetic variants that can be applied to large-scale genomic variation data sets. We demonstrate the accuracy and robustness of the approach through simulation and apply it to over 16 million single nucleotide poly-morphisms (SNPs) from two publicly available human genomic diversity resources. We characterize the differential relationship between variant frequency and age in different geographical regions and demonstrate the value of allele age in interpreting variants of known functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the genealogical history of a single genome or a group of individuals.
biorxiv genomics 100-200-users 2018Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks, bioRxiv, 2018-09-14
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here we sought to apply deep convolutional neural networks towards this goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, which we call Xpresso, more than doubles the accuracy of alternative sequence-based models, and isolates rules as predictive as models relying on ChIP-seq data. Xpresso recapitulates genome-wide patterns of transcriptional activity and predicts the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose the accurate prediction of cell type-specific gene expression based solely on primary sequence as a grand challenge for the field.
biorxiv genomics 200-500-users 2018The landscape of somatic mutation in normal colorectal epithelial cells, bioRxiv, 2018-09-14
AbstractThe colorectal adenoma-carcinoma sequence has provided a paradigmatic framework for understanding the successive somatic genetic changes and consequent clonal expansions leading to cancer. As for most cancer types, however, understanding of the earliest phases of colorectal neoplastic change, which may occur in morphologically normal tissue, is comparatively limited because of the difficulty of detecting somatic mutations in normal cells. Each colorectal crypt is a small clone of cells derived from a single recently-existing stem cell. Here, we whole genome sequenced hundreds of normal crypts from 42 individuals. Signatures of multiple mutational processes were revealed, some ubiquitous and continuous, others only found in some individuals, in some crypts or during some phases of the cell lineage from zygote to adult cell. Likely driver mutations were present in ∼1% of normal colorectal crypts in middle-aged individuals, indicating that adenomas and carcinomas are rare outcomes of a pervasive process of neoplastic change across morphologically normal colorectal epithelium.
biorxiv cancer-biology 0-100-users 2018Bio-On-Magnetic-Beads (BOMB) Open platform for high-throughput nucleic acid extraction and manipulation, bioRxiv, 2018-09-13
AbstractCurrent molecular biology laboratories rely heavily on the purification and manipulation of nucleic acids. Yet, commonly used centrifuge-and column-based protocols require specialised equipment, often use toxic reagents and are not economically scalable or practical to use in a high-throughput manner. Although it has been known for some time that magnetic beads can provide an elegant answer to these issues, the development of open-source protocols based on beads has been limited. In this article, we provide step-by-step instructions for an easy synthesis of functionalised magnetic beads, and detailed protocols for their use in the high-throughput purification of plasmids, genomic DNA and total RNA from different sources, as well as environmental TNA and PCR amplicons. We also provide a bead-based protocol for bisulfite conversion, and size selection of DNA and RNA fragments. Comparison to other methods highlights the capability, versatility and extreme cost-effectiveness of using magnetic beads. These open source protocols and the associated webpage (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbomb.bio>httpsbomb.bio<jatsext-link>) can serve as a platform for further protocol customisation and community engagement.
biorxiv molecular-biology 200-500-users 2018Novel childhood experience suggests eccentricity drives organization of human visual cortex, bioRxiv, 2018-09-13
The functional organization of human high-level visual cortex, such as face and place-selective regions, is strikingly consistent across individuals. A fundamental, unanswered question in neuroscience is what dimensions of visual information constrain the development and topography of this shared brain organization? To answer this question, we scanned with fMRI a unique group of adults who, as children, engaged in extensive experience with a novel stimulus, Pokemon, that varied along critical dimensions (foveal bias, rectilinearity, size, animacy) from other ecological categories such as faces and places. We find that experienced adults not only demonstrate distinct and consistent distributed cortical responses to Pokemon, but their activations suggest that it is the experienced retinal eccentricity during childhood that predicts the locus of distributed responses to Pokemon in adulthood. These data advance our understanding about how childhood experience and functional constraints shape the functional organization of the human brain.
biorxiv neuroscience 100-200-users 2018Cardelino Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants, bioRxiv, 2018-09-12
AbstractDecoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comPMBiocardelino>httpsgithub.comPMBiocardelino<jatsext-link>), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.Key findings<jatslist list-type=bullet><jatslist-item>A novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes.<jatslist-item><jatslist-item>Evidence for non-neutral evolution of clonal populations in human fibroblasts.<jatslist-item><jatslist-item>Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.<jatslist-item>
biorxiv genomics 100-200-users 2018