The UCSC Repeat Browser allows discovery and visualization of evolutionary conflict across repeat families, bioRxiv, 2018-09-28
ABSTRACTBackgroundNearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of these sequences play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats.ResultsWe present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from the gold standard repeat database RepeatMasker. The UCSC Repeat Browser contains mapped annotations from the human genome to these references, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. Furthermore, it provides processed tracks of multiple publicly available datasets of biological interest to the repeat community, including ChIP-SEQ datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. Here we show how the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, can be used to trace the independent trajectories of species-specific evolutionary conflicts.ConclusionsThe UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.
biorxiv genomics 0-100-users 2018Cohort Profile East London Genes & Health (ELGH), a community based population genomics and health study of British-Bangladeshi and British-Pakistani people., bioRxiv, 2018-09-27
Cohort profile in a nutshell East London Genes & Health (ELGH) is a large scale, community genomics and health study (to date >34,000 volunteers; target 100,000 volunteers). ELGH was set up in 2015 to gain deeper understanding of health and disease, and underlying genetic influences, in British-Bangladeshi and British-Pakistani people living in east London. ELGH prioritises studies in areas important to, and identified by, the community it represents. Current priorities include cardiometabolic diseases and mental illness, these being of notably high prevalence and severity. However studies in any scientific area are possible, subject to community advisory group and ethical approval. ELGH combines health data science (using linked UK National Health Service (NHS) electronic health record data) with exome sequencing and SNP array genotyping to elucidate the genetic influence on health and disease, including the contribution from high rates of parental relatedness on rare genetic variation and homozygosity (autozygosity), in two understudied ethnic groups. Linkage to longitudinal health record data enables both retrospective and prospective analyses. Through Stage 2 studies, ELGH offers researchers the opportunity to undertake recall-by-genotype andor recall-by-phenotype studies on volunteers. Sub-cohort, trial-within-cohort, and other study designs are possible. ELGH is a fully collaborative, open access resource, open to academic and life sciences industry scientific research partners.
biorxiv genomics 0-100-users 2018Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy, bioRxiv, 2018-09-27
AbstractWe report an overfitting issue in recent machine learning formulations of the enhancer-promoter interaction problem arising from the fact that many enhancer-promoter pairs share features. Cross- fold validation schemes which do not correctly separate these feature sharing enhancer-promoter pairs into one test set report high accuracy, which is actually due to overfitting. Cross-fold validation schemes which properly segregate pairs with shared features show markedly reduced ability to predict enhancer-promoter interactions from epigenomic state. Parameter scans with multiple models indicate that local epigenomic features of individual pairs of enhancers and promoters cannot distinguish those pairs that interact from those which do with high accuracy, suggesting that additional information is required to predict enhancer-promoter interactions.
biorxiv genomics 0-100-users 2018Towards inferring causal gene regulatory networks from single cell expression Measurements, bioRxiv, 2018-09-27
AbstractSingle-cell transcriptome sequencing now routinely samples thousands of cells, potentially providing enough data to reconstruct causal gene regulatory networks from observational data. Here, we present Scribe, a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell experiments to power network reconstruction. Scribe employs Restricted Directed Information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for pseudotime” ordered single-cell data compared to true time series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as “RNA velocity” restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses therefore highlight an important shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and point the way towards overcoming it.
biorxiv genomics 100-200-users 2018High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes, bioRxiv, 2018-09-25
AbstractHigh-throughput single-cell RNA-Sequencing is a powerful technique for gene expression profiling of complex and heterogeneous cellular populations such as the immune system. However, these methods only provide short-read sequence from one end of a cDNA template, making them poorly suited to the investigation of gene-regulatory events such as mRNA splicing, adaptive immune responses or somatic genome evolution. To address this challenge, we have developed a method that combines targeted long-read sequencing with short-read based transcriptome profiling of barcoded single cell libraries generated by droplet-based partitioning. We use Repertoire And Gene Expression sequencing (RAGE-seq) to accurately characterize full-length T cell (TCR) and B cell (BCR) receptor sequences and transcriptional profiles of more than 7,138 lymphocytes sampled from the primary tumour and draining lymph node of a breast cancer patient. With this method we show that somatic mutation, alternate splicing and clonal evolution of T and B lymphocytes can be tracked across these tissue compartments. Our results demonstrate that RAGE-Seq is an accessible and cost-effective method for high-throughput deep single cell profiling, applicable to a wide range of biological challenges.
biorxiv genomics 100-200-users 2018A transposable element insertion is the switch between alternative life history strategies, bioRxiv, 2018-09-24
Tradeoffs affect resource allocation during development and result in fitness consequences that drive the evolution of life history strategies. Yet despite their importance, we know little about the mechanisms underlying life history tradeoffs in wild populations. Many species of Colias butterflies exhibit an alternative life history strategy (ALHS) where females divert resources from wing pigment synthesis to reproductive and somatic development. Due to this reallocation, a wing color polymorphism is associated with the ALHS individuals have either yelloworange or white wings. Here we map the genetic basis of the ALHS switch in Colias crocea to a transposable element insertion downstream of the Colias homolog of BarH-1, a homeobox transcription factor. Using CRISPRCas9 gene editing, antibody staining, and electron microscopy we find morph-specific specific expression of BarH-1 suppresses the formation of pigment granules in wing scales. Lipid and transcriptome analyses reveal physiological differences associated with the ALHS. These findings characterize a novel mechanism for a female-limited ALHS and show that the switch arises via recruitment of a transcription factor previously known for its function in cell fate determination in pigment cells of the retina.
biorxiv genomics 100-200-users 2018