Frequent lack of repressive capacity of promoter DNA methylation identified through genome-wide epigenomic manipulation, bioRxiv, 2017-08-17
AbstractIt is widely assumed that the addition of DNA methylation at CpG rich gene promoters silences gene transcription. However, this conclusion is largely drawn from the observation that promoter DNA methylation inversely correlates with gene expression in natural conditions. The effect of induced DNA methylation on endogenous promoters has yet to be comprehensively assessed. Here, we induced the simultaneous methylation of thousands of promoters in the genome of human cells using an engineered zinc finger-DNMT3A fusion protein, enabling assessment of the effect of forced DNA methylation upon transcription, histone modifications, and DNA methylation persistence after the removal of the fusion protein. We find that DNA methylation is frequently insufficient to transcriptionally repress promoters. Furthermore, DNA methylation deposited at promoter regions associated with H3K4me3 is rapidly erased after removal of the zinc finger-DNMT3A fusion protein. Finally, we demonstrate that induced DNA methylation can exist simultaneously on promoter nucleosomes that possess the active histone modification H3K4me3, or DNA bound by the initiated form of RNA polymerase II. These findings suggest that promoter DNA methylation is not generally sufficient for transcriptional inactivation, with implications for the emerging field of epigenome engineering.One Sentence SummaryGenome-wide epigenomic manipulation of thousands of human promoters reveals that induced promoter DNA methylation is unstable and frequently does not function as a primary instructive biochemical signal for gene silencing and chromatin reconfiguration.
biorxiv genomics 500+-users 2017Opportunities and obstacles for deep learning in biology and medicine, bioRxiv, 2017-05-29
AbstractDeep learning, which describes a class of machine learning algorithms, has recently showed impressive results across a variety of domains. Biology and medicine are data rich, but the data are complex and often ill-understood. Problems of this nature may be particularly well-suited to deep learning techniques. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes, and treatment of patients—and discuss whether deep learning will transform these tasks or if the biomedical sphere poses unique challenges. We find that deep learning has yet to revolutionize or definitively resolve any of these problems, but promising advances have been made on the prior state of the art. Even when improvement over a previous baseline has been modest, we have seen signs that deep learning methods may speed or aid human investigation. More work is needed to address concerns related to interpretability and how to best model each problem. Furthermore, the limited amount of labeled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning powering changes at both bench and bedside with the potential to transform several areas of biology and medicine.
biorxiv bioinformatics 500+-users 2017No compelling evidence that preferences for facial masculinity track changes in women’s hormonal status, bioRxiv, 2017-05-13
AbstractAlthough widely cited as strong evidence that sexual selection has shaped human facial attractiveness judgments, evidence that preferences for masculine characteristics in men’s faces are related to women’s hormonal status is equivocal and controversial. Consequently, we conducted the largest ever longitudinal study of the hormonal correlates of women’s preferences for facial masculinity (N=584). Analyses showed no compelling evidence that preferences for facial masculinity were related to changes in women’s salivary steroid hormone levels. Furthermore, both within-subject and between-subject comparisons showed no evidence that oral contraceptive use decreased masculinity preferences. However, women generally preferred masculinized over feminized versions of men’s faces, particularly when assessing men’s attractiveness for short-term, rather than long-term, relationships. Our results do not support the hypothesized link between women’s preferences for facial masculinity and their hormonal status.
biorxiv animal-behavior-and-cognition 500+-users 2017The Human Cell Atlas, bioRxiv, 2017-05-09
AbstractThe recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body, by undertaking a Human Cell Atlas Project as an international collaborative effort. The aim would be to define all human cell types in terms of distinctive molecular profiles (e.g., gene expression) and connect this information with classical cellular descriptions (e.g., location and morphology). A comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, as well as provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas.
biorxiv cell-biology 500+-users 2017Nanopore sequencing and assembly of a human genome with ultra-long reads, bioRxiv, 2017-04-21
AbstractNanopore sequencing is a promising technique for genome sequencing due to its portability, ability to sequence long reads from single molecules, and to simultaneously assay DNA methylation. However until recently nanopore sequencing has been mainly applied to small genomes, due to the limited output attainable. We present nanopore sequencing and assembly of the GM12878 UtahCeph human reference genome generated using the Oxford Nanopore MinION and R9.4 version chemistry. We generated 91.2 Gb of sequence data (∼30× theoretical coverage) from 39 flowcells. De novo assembly yielded a highly complete and contiguous assembly (NG50 ∼3Mb). We observed considerable variability in homopolymeric tract resolution between different basecallers. The data permitted sensitive detection of both large structural variants and epigenetic modifications. Further we developed a new approach exploiting the long-read capability of this system and found that adding an additional 5×-coverage of ‘ultra-long’ reads (read N50 of 99.7kb) more than doubled the assembly contiguity. Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone. Portable de novo sequencing of human genomes may be important for rapid point-of-care diagnosis of rare genetic diseases and cancer, and monitoring of cancer progression. The complete dataset including raw signal is available as an Amazon Web Services Open Dataset at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comnanopore-wgs-consortiumNA12878>httpsgithub.comnanopore-wgs-consortiumNA12878<jatsext-link>.
biorxiv genomics 500+-users 2017Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing, bioRxiv, 2017-04-10
AbstractIllumina-based next generation sequencing (NGS) has accelerated biomedical discovery through its ability to generate thousands of gigabases of sequencing output per run at a fraction of the time and cost of conventional technologies. The process typically involves four basic steps library preparation, cluster generation, sequencing, and data analysis. In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines (HiSeq 30004000X Ten) called exclusion amplification (ExAmp), which was a fundamental shift from the earlier method of random cluster generation by bridge amplification on a non-patterned flow cell. The ExAmp chemistry, in conjunction with patterned flow cells containing nanowells at fixed locations, increases cluster density on the flow cell, thereby reducing the cost per run. It also increases sequence read quality, especially for longer read lengths (up to 150 base pairs). This advance has been widely adopted for genome sequencing because greater sequencing depth can be achieved for lower cost without compromising the quality of longer reads. We show that this promising chemistry is problematic, however, when multiplexing samples. We discovered that up to 5-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this “spreading-of-signals” arises from low levels of free index primers present in the pool. These index primers can prime pooled library fragments at random via complementary 3’ ends, and get extended by DNA polymerase, creating a new library molecule with a new index before binding to the patterned flow cell to generate a cluster for sequencing. This causes the resulting read from that cluster to be assigned to a different sample, causing the spread of signals within multiplexed samples. We show that low levels of free index primers persist after the most common library purification procedure recommended by Illumina, and that the amount of signal spreading among samples is proportional to the level of free index primer present in the library pool. This artifact causes homogenization and misclassification of cells in single cell RNA-seq experiments. Therefore, all data generated in this way must now be carefully re-examined to ensure that “spreading-of-signals” has not compromised data analysis and conclusions. Re-sequencing samples using an older technology that uses conventional bridge amplification for cluster generation, or improved library cleanup strategies to remove free index primers, can minimize or eliminate this signal spreading artifact.
biorxiv molecular-biology 500+-users 2017