The user’s guide to comparative genomics with EnteroBase. Three case studies micro-clades within Salmonella enterica serovar Agama, ancient and modern populations of Yersinia pestis, and core genomic diversity of all Escherichia, bioRxiv, 2019-04-19
AbstractEnteroBase is an integrated software environment which supports the identification of global population structures within several bacterial genera including pathogens. It currently contains more than 300,000 genomes that have been assembled from Illumina short reads from the genera Salmonella, Escherichia, Yersinia, Clostridiodes, Helicobacter, Vibrio, and Moraxella. With the recent introduction of hierarchical clustering of core genome MLST sequence types, EnteroBase now facilitates the identification of close relatives of bacteria within those genera inside of a few hours of uploading their short reads. It also supports private collaborations between groups of users, and the comparison of genomic data that were assembled from short reads with SNP calls that were extracted from metagenomic sequences. Here we provide an overview for its users on how EnteroBase works, what it can do, and its future prospects. This user’s guide is illustrated by three case studies ranging in scale from the miniscule (local transmission of Salmonella between neighboring social groups of badgers) through pandemic transmission of plague and microevolution of Yersinia pestis over the last 5,000 years to a novel, global overview of the population structure of all of Escherichia.
biorxiv microbiology 100-200-users 2019A revised model for promoter competition based on multi-way chromatin interactions, bioRxiv, 2019-04-18
AbstractSpecific communication between gene promoters and enhancers is critical for accurate regulation of gene expression. However, it remains unclear how specific interactions between multiple regulatory elements and genes contained within a single chromatin domain are coordinated. Recent technological advances allow for the investigation of multi-way chromatin interactions at single alleles in individual nuclei. This can provide insights into how multiple regulatory elements cooperate or compete for transcriptional activation. We have used these techniques in a mouse model in which the α-globin domain is extended to include several additional genes. This allows us to determine how the interactions of the α-globin super-enhancer are distributed between multiple promoters in a single domain. Our data show that gene promoters do not form mutually exclusive interactions with the super-enhancer, but all interact simultaneously in a single complex. These finding show that promoters within the same domain do not structurally compete for interactions with enhancers, but form a regulatory hub structure, consistent with the recent model of transcriptional activation in phase-separated nuclear condensates.
biorxiv genomics 100-200-users 2019deSALT fast and accurate long transcriptomic read alignment with de Bruijn graph-based index, bioRxiv, 2019-04-18
AbstractLong-read RNA sequencing (RNA-seq) is a promising approach in transcriptomics studies, however, the alignment of the long reads is a fundamental but still non-trivial task due to sequencing errors and complicated gene structures. We propose de Bruijn graph-based Spliced Aligner for Long Transcriptome read (deSALT), a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons and uses them to generate high-quality spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and serious sequencing errors, which breakthroughs the bottlenecks of long RNA-seq read alignment. Benchmarks demonstrate that this approach has a greater ability to produce accurate and homogeneous full-length alignments and thus has enormous potentials in transcriptomics studies.
biorxiv bioinformatics 100-200-users 2019Droplet-based combinatorial indexing for massive scale single-cell epigenomics, bioRxiv, 2019-04-18
AbstractWhile recent technical advancements have facilitated the mapping of epigenomes at single-cell resolution, the throughput and quality of these methods have limited the widespread adoption of these technologies. Here, we describe a droplet microfluidics platform for single-cell assay for transposase accessible chromatin (scATAC-seq) for high-throughput single-cell profiling of chromatin accessibility. We use this approach for the unbiased discovery of cell types and regulatory elements within the mouse brain. Further, we extend the throughput of this approach by pairing combinatorial indexing with droplet microfluidics, enabling single-cell studies at a massive scale. With this approach, we measure chromatin accessibility across resting and stimulated human bone marrow derived cells to reveal changes in the cis- and trans- regulatory landscape across cell types and upon stimulation conditions at single-cell resolution. Altogether, we describe a total of 502,207 single-cell profiles, demonstrating the scalability and flexibility of this droplet-based platform.
biorxiv genomics 200-500-users 2019Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion, bioRxiv, 2019-04-18
AbstractUnderstanding complex tissues requires single-cell deconstruction of gene regulation with precision and scale. Here we present a massively parallel droplet-based platform for mapping transposase-accessible chromatin in tens of thousands of single cells per sample (scATAC-seq). We obtain and analyze chromatin profiles of over 200,000 single cells in two primary human systems. In blood, scATAC-seq allows marker-free identification of cell type-specific cis- and trans-regulatory elements, mapping of disease-associated enhancer activity, and reconstruction of trajectories of differentiation from progenitors to diverse and rare immune cell types. In basal cell carcinoma, scATAC-seq reveals regulatory landscapes of malignant, stromal, and immune cell types in the tumor microenvironment. Moreover, scATAC-seq of serial tumor biopsies before and after PD-1 blockade allows identification of chromatin regulators and differentiation trajectories of therapy-responsive intratumoral T cell subsets, revealing a shared regulatory program driving CD8+ T cell exhaustion and CD4+ T follicular helper cell development. We anticipate that droplet-based single-cell chromatin accessibility will provide a broadly applicable means of identifying regulatory factors and elements that underlie cell type and function.
biorxiv genomics 200-500-users 2019Chitin perception in plasmodesmata identifies subcellular, context-specific immune signalling in plants, bioRxiv, 2019-04-17
AbstractThe plasma membrane (PM) that lines plasmodesmata has a distinct protein and lipid composition, underpinning specific regulation of these connections between cells. The plasmodesmal PM can integrate extracellular signals differently from the cellular PM, but it is not known how this specificity is established or how a single stimulus can trigger independent signalling cascades in neighbouring membrane domains. Here we have used the fungal elicitor chitin to investigate signal integration and responses at the plasmodesmal PM. We found that the plasmodesmal PM employs a receptor complex composed of the LysM receptors LYM2 and LYK4 which respectively change their location and interactions in response to chitin. Downstream, signalling is transmitted via a specific phosphorylation signature of an NADPH oxidase and localised callose synthesis that causes plasmodesmata closure. This demonstrates the plasmodesmal PM deploys both plasmodesmata-specific components and differential activation of PM-common components to independently integrate an immune signal.
biorxiv plant-biology 0-100-users 2019