Highly parallel genome variant engineering with CRISPRCas9 in eukaryotic cells, bioRxiv, 2017-06-09
AbstractDirect measurement of functional effects of DNA sequence variants throughout a genome is a major challenge. We developed a method that uses CRISPRCas9 to engineer many specific variants of interest in parallel in the budding yeast Saccharomyces cerevisiae, and to screen them for functional effects. We used the method to examine the functional consequences of premature termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the C-terminal end and did not interrupt an annotated protein domain. Surprisingly, we discovered that some putatively essential genes are dispensable, while others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.
biorxiv genomics 0-100-users 2017Integrating long-range connectivity information into de Bruijn graphs, bioRxiv, 2017-06-09
AbstractMotivationThe de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input.ResultsWe present a novel assembly graph data structure the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both the de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterise the genomic context of drug-resistance genes.AvailabilityLinked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, available under the MIT license at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httphttpsgithub.commcveanmccortex>httpsgithub.commcveanmccortex<jatsext-link>.Contactturner.isaac@gmail.com.
biorxiv bioinformatics 0-100-users 2017Genome-wide quantification of the effects of DNA methylation on human gene regulation, bioRxiv, 2017-06-08
AbstractChanges in DNA methylation are important in development and disease, but not all regulatory elements act in a methylation-dependent (MD) manner. Here, we developed mSTARR-seq, a high-throughput approach to quantify the effects of DNA methylation on regulatory element function. We assay MD activity in 14% of the euchromatic human genome, identify 2,143 MD regulatory elements, and predict MD activity using sequence and chromatin state information. We identify transcription factors associated with higher activity in unmethylated or methylated states, including an association between pioneer transcription factors and methylated DNA. Finally, we use mSTARR-seq to predict DNA methylation-gene expression correlations in primary cells. Our findings provide a map of MD regulatory activity across the human genome, facilitating interpretation of the many emerging associations between methylation and trait variation.
biorxiv genomics 0-100-users 2017Non-invasive laminar inference with MEG Comparison of methods and source inversion algorithms, bioRxiv, 2017-06-08
AbstractMagnetoencephalography (MEG) is a direct measure of neuronal current flow; its anatomical resolution is therefore not constrained by physiology but rather by data quality and the models used to explain these data. Recent simulation work has shown that it is possible to distinguish between signals arising in the deep and superficial cortical laminae given accurate knowledge of these surfaces with respect to the MEG sensors. This previous work has focused around a single inversion scheme (multiple sparse priors) and a single global parametric fit metric (free energy). In this paper we use several different source inversion algorithms and both local and global, as well as parametric and non-parametric fit metrics in order to demonstrate the robustness of the discrimination between layers. We find that only algorithms with some sparsity constraint can successfully be used to make laminar discrimination. Importantly, local t-statistics, global cross-validation and free energy all provide robust and mutually corroborating metrics of fit. We show that discrimination accuracy is affected by patch size estimates, cortical surface features, and lead field strength, which suggests several possible future improvements to this technique. This study demonstrates the possibility of determining the laminar origin of MEG sensor activity, and thus directly testing theories of human cognition that involve laminar- and frequency- specific mechanisms. This possibility can now be achieved using recent developments in high precision MEG, most notably the use of subject-specific head-casts, which allow for significant increases in data quality and therefore anatomically precise MEG recordings.
biorxiv neuroscience 0-100-users 2017Resetting the yeast epigenome with human nucleosomes, bioRxiv, 2017-06-08
SummaryHumans and yeast are separated by a billion years of evolution, yet their conserved core histones retain central roles in gene regulation. Here, we “reset” yeast to use core human nucleosomes in lieu of their own, an exceedingly rare event which initially took twenty days. The cells adapt, however, by acquiring suppressor mutations in cell-division genes, or by acquiring certain aneuploidy states. Robust growth was also restored by converting five histone residues back to their yeast counterparts. We reveal that humanized nucleosomes in yeast are positioned according to endogenous yeast DNA sequence and chromatin-remodeling network, as judged by a yeast-like nucleosome repeat length. However, human nucleosomes have higher DNA occupancy and reduce RNA content. Adaptation to new biological conditions presented a special challenge for these cells due to slower chromatin remodeling. This humanized yeast poses many fundamental new questions about the nature of chromatin and how it is linked to many cell processes, and provides a platform to study histone variants via yeast epigenome reprogramming.Highlights<jatslist list-type=simple><jatslist-item>- Only 1 in 107 yeast survive with fully human nucleosomes, but they rapidly evolve<jatslist-item><jatslist-item>- Nucleosome positioning and nucleosome repeat length is not influenced by histone type<jatslist-item><jatslist-item>- Human nucleosomes remodel slowly and delay yeast environmental adaptation<jatslist-item><jatslist-item>- Human core nucleosomes are more repressive and globally reduce transcription in yeast<jatslist-item>
biorxiv synthetic-biology 0-100-users 2017Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago, bioRxiv, 2017-06-06
Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens . To examine the region's human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East AfricanEurasian pastoralist groups arriving >1,000 years ago, including the Ju|'hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) 'hot spot' for the evolution of our species.
biorxiv evolutionary-biology 200-500-users 2017