Structure of transcribing RNA polymerase II-nucleosome complex, bioRxiv, 2018-10-07

Transcription of eukaryotic protein-coding genes requires passage of RNA polymerase II (Pol II) through chromatin. Pol II passage is impaired by nucleosomes and requires elongation factors that help Pol II to efficiently overcome the nucleosomal barrier1-4. How the Pol II machinery transcribes through a nucleosome remains unclear because structural studies have been limited to Pol II elongation complexes formed on DNA templates lacking nucleosomes5. Here we report the cryo-electron microscopy (cryo-EM) structure of transcribing Pol II from the yeast Saccharomyces cerevisiae engaged with a downstream nucleosome core particle (NCP) at an overall resolution of 4.4 Å with resolutions ranging from 4-6 Å in Pol II and 6-8 Å in the NCP. Pol II and the NCP adopt a defined orientation that could not be predicted from modelling. Pol II contacts DNA of the incoming NCP on both sides of the nucleosomal dyad with its domains ‘clamp head’ and ‘lobe’. Comparison of the Pol II-NCP structure to known structures of Pol II complexes reveals that the elongation factors TFIIS, DSIF, NELF, PAF1 complex, and SPT6 can be accommodated on the Pol II surface in the presence of the oriented nucleosome. Further structural comparisons show that the chromatin remodelling enzyme Chd1, which is also required for efficient Pol II passage6,7, could bind the oriented nucleosome with its motor domain. The DNA-binding region of Chd1 must however be released from DNA when Pol II approaches the nucleosome, and based on published data8,9 this is predicted to stimulate Chd1 activity and to facilitate Pol II passage. Our results provide a starting point for a mechanistic analysis of chromatin transcription.

biorxiv biochemistry 100-200-users 2018

An introduction to MPEG-G, the new ISO standard for genomic information representation, bioRxiv, 2018-09-27

AbstractThe MPEG-G standardization initiative is a coordinated international effort to specify a compressed data format that enables large scale genomic data to be processed, transported and shared. The standard consists of a set of specifications (i.e., a book) describing i) a nor-mative format syntax, and ii) a normative decoding process to retrieve the information coded in a compliant file or bitstream. Such decoding process enables the use of leading-edge com-pression technologies that have exhibited significant compression gains over currently used formats for storage of unaligned and aligned sequencing reads. Additionally, the standard provides a wealth of much needed functionality, such as selective access, data aggregation, ap-plication programming interfaces to the compressed data, standard interfaces to support data protection mechanisms, support for streaming and a procedure to assess the conformance of implementations. ISOIEC is engaged in supporting the maintenance and availability of the standard specification, which guarantees the perenniality of applications using MPEG-G. Fi-nally, the standard ensures interoperability and integration with existing genomic information processing pipelines by providing support for conversion from the FASTQSAMBAM file formats.In this paper we provide an overview of the MPEG-G specification, with particular focus on the main advantages and novel functionality it offers. As the standard only specifies the decoding process, encoding performance, both in terms of speed and compression ratio, can vary depending on specific encoder implementations, and will likely improve during the lifetime of MPEG-G. Hence, the performance statistics provided here are only indicative baseline examples of the technologies included in the standard.

biorxiv bioinformatics 100-200-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo