The UCSC Repeat Browser allows discovery and visualization of evolutionary conflict across repeat families, bioRxiv, 2018-09-28

ABSTRACTBackgroundNearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of these sequences play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats.ResultsWe present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from the gold standard repeat database RepeatMasker. The UCSC Repeat Browser contains mapped annotations from the human genome to these references, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. Furthermore, it provides processed tracks of multiple publicly available datasets of biological interest to the repeat community, including ChIP-SEQ datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. Here we show how the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, can be used to trace the independent trajectories of species-specific evolutionary conflicts.ConclusionsThe UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.

biorxiv genomics 0-100-users 2018

A high-resolution map of non-crossover events reveals impacts of genetic diversity on mammalian meiotic recombination, bioRxiv, 2018-09-27

During meiotic recombination in most mammals, hundreds of programmed DNA Double-Strand Breaks (DSBs) occur across all chromosomes in each cell at sites bound by the protein PRDM9. Faithful DSB repair using the homologous chromosome is essential for fertility, yielding either non-crossovers, which are frequent but difficult to detect, or crossovers. In certain hybrid mice, high sequence divergence causes PRDM9 to bind each homologue at different sites, 'asymmetrically', and these mice exhibit meiotic failure and infertility, by unknown mechanisms. To investigate the impact of local sequence divergence on recombination, we intercrossed two mouse subspecies over five generations and deep-sequenced 119 offspring, whose high heterozygosity allowed detection of thousands of crossover and non-crossover events with unprecedented power and spatial resolution. Both crossovers and non-crossovers are strongly depleted at individual asymmetric sites, revealing that PRDM9 not only positions DSBs but also promotes their homologous repair by binding to the unbroken homologue at each site. Unexpectedly, we found that non-crossovers containing multiple mismatches repair by a different mechanism than single-mismatch sites, which undergo GC-biased gene conversion. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility.

biorxiv genetics 0-100-users 2018

An introduction to MPEG-G, the new ISO standard for genomic information representation, bioRxiv, 2018-09-27

AbstractThe MPEG-G standardization initiative is a coordinated international effort to specify a compressed data format that enables large scale genomic data to be processed, transported and shared. The standard consists of a set of specifications (i.e., a book) describing i) a nor-mative format syntax, and ii) a normative decoding process to retrieve the information coded in a compliant file or bitstream. Such decoding process enables the use of leading-edge com-pression technologies that have exhibited significant compression gains over currently used formats for storage of unaligned and aligned sequencing reads. Additionally, the standard provides a wealth of much needed functionality, such as selective access, data aggregation, ap-plication programming interfaces to the compressed data, standard interfaces to support data protection mechanisms, support for streaming and a procedure to assess the conformance of implementations. ISOIEC is engaged in supporting the maintenance and availability of the standard specification, which guarantees the perenniality of applications using MPEG-G. Fi-nally, the standard ensures interoperability and integration with existing genomic information processing pipelines by providing support for conversion from the FASTQSAMBAM file formats.In this paper we provide an overview of the MPEG-G specification, with particular focus on the main advantages and novel functionality it offers. As the standard only specifies the decoding process, encoding performance, both in terms of speed and compression ratio, can vary depending on specific encoder implementations, and will likely improve during the lifetime of MPEG-G. Hence, the performance statistics provided here are only indicative baseline examples of the technologies included in the standard.

biorxiv bioinformatics 100-200-users 2018

Cohort Profile East London Genes & Health (ELGH), a community based population genomics and health study of British-Bangladeshi and British-Pakistani people., bioRxiv, 2018-09-27

Cohort profile in a nutshell East London Genes & Health (ELGH) is a large scale, community genomics and health study (to date >34,000 volunteers; target 100,000 volunteers). ELGH was set up in 2015 to gain deeper understanding of health and disease, and underlying genetic influences, in British-Bangladeshi and British-Pakistani people living in east London. ELGH prioritises studies in areas important to, and identified by, the community it represents. Current priorities include cardiometabolic diseases and mental illness, these being of notably high prevalence and severity. However studies in any scientific area are possible, subject to community advisory group and ethical approval. ELGH combines health data science (using linked UK National Health Service (NHS) electronic health record data) with exome sequencing and SNP array genotyping to elucidate the genetic influence on health and disease, including the contribution from high rates of parental relatedness on rare genetic variation and homozygosity (autozygosity), in two understudied ethnic groups. Linkage to longitudinal health record data enables both retrospective and prospective analyses. Through Stage 2 studies, ELGH offers researchers the opportunity to undertake recall-by-genotype andor recall-by-phenotype studies on volunteers. Sub-cohort, trial-within-cohort, and other study designs are possible. ELGH is a fully collaborative, open access resource, open to academic and life sciences industry scientific research partners.

biorxiv genomics 0-100-users 2018

Collective intercellular communication through ultra-fast hydrodynamic trigger waves, bioRxiv, 2018-09-27

The biophysical relationships between sensors and actuators have been fundamental to the development of complex life forms; abundant flows are generated and persist in aquatic environments by swimming organisms, while responding promptly to external stimuli is key to survival. Here, akin to a chain reaction, we present the discovery of hydrodynamic trigger waves in cellular communities of the protist Spirostomum ambiguum, propagating hundreds of times faster than the swimming speed. Coiling its cytoskeleton, Spirostomum can contract its long body by 50% within milliseconds, with accelerations reaching 14g-forces. Surprisingly, a single cellular contraction (transmitter) is shown to generate long-ranged vortex flows at intermedi- ate Reynolds numbers, which can trigger neighbouring cells, in turn. To measure the sensitivity to hydrodynamic signals (receiver), we further present a high-throughput suction-flow device to probe mechanosensitive ion channel gating by back-calculating the microscopic forces on the cell mem- brane. These ultra-fast hydrodynamic trigger waves are analysed and modelled quantitatively in a universal framework of antenna and percolation theory. A phase transition is revealed, requiring a critical colony density to sustain collective communication. Our results suggest that this signalling could help organise cohabiting communities over large distances, influencing long-term behaviour through gene expression, comparable to quorum sensing. More immediately, as contractions release toxins, synchronised discharges could also facilitate the repulsion of large predators, or conversely immobilise large prey. We postulate that beyond protists numerous other freshwater and marine organisms could coordinate with variations of hydrodynamic trigger waves.

biorxiv biophysics 200-500-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo