Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics, bioRxiv, 2019-08-23
AbstractWe report chromosome-level, reference-quality assemblies of seven Arabidopsis thaliana accessions selected across the global range of this predominately ruderal plant. Each genome revealed between 13-17 Mb rearranged and 5-6 Mb novel sequence introducing copy-number changes in ∼5,000 genes, including ∼1,900 genes which are not part of the current reference annotation. Analyzing the collinearity between the genomes revealed ∼350 regions (4.1% of the euchromatin) where accession-specific tandem duplications destroyed the syntenic gene order between the genomes. These hotspots of rearrangements were characterized by the loss of meiotic recombination in hybrids within these regions and the enrichment of genes implicated in biotic stress response. Together this suggests that hotspots of rearrangements are governed by altered evolutionary dynamics as compared to the rest of the genome, which are based on new mutations and not on the recombination of existing variation, and thereby enable a quick response to the ever-evolving challenges of biotic stress.
biorxiv genetics 100-200-users 2019Deep learning at base-resolution reveals motif syntax of the cis-regulatory code, bioRxiv, 2019-08-22
AbstractGenes are regulated through enhancer sequences, in which transcription factor binding motifs and their specific arrangements (syntax) form a cis-regulatory code. To understand the relationship between motif syntax and transcription factor binding, we train a deep learning model that uses DNA sequence to predict base-resolution binding profiles of four pluripotency transcription factors Oct4, Sox2, Nanog, and Klf4. We interpret the model to accurately map hundreds of thousands of motifs in the genome, learn novel motif representations and identify rules by which motifs and syntax influence transcription factor binding. We find that instances of strict motif spacing are largely due to retrotransposons, but that soft motif syntax influences motif interactions at protein and nucleosome range. Most strikingly, Nanog binding is driven by motifs with a strong preference for ∼10.5 bp spacings corresponding to helical periodicity. Interpreting deep learning models applied to high-resolution binding data is a powerful and versatile approach to uncover the motifs and syntax of cis-regulatory sequences.
biorxiv genomics 100-200-users 2019ATLAS a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data, bioRxiv, 2019-08-20
AbstractBackgroundMetagenomics and metatranscriptomics studies provide valuable insight into the composition and function of microbial populations from diverse environments, however the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers.ResultsHere we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome and metatranscriptome data. Genome-centric resolution and abundance estimates are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license.ConclusionATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome and metatranscriptome data processing; it is easily installable with conda and maintained as open-source on GitHub at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.commetagenome-atlasatlas>httpsgithub.commetagenome-atlasatlas<jatsext-link>.
biorxiv bioinformatics 100-200-users 2019A Single-Cell Transcriptome Atlas for Zebrafish Development, bioRxiv, 2019-08-19
ABSTRACTThe ability to define cell types and how they change during organogenesis is central to our understanding of animal development and human disease. Despite the crucial nature of this knowledge, we have yet to fully characterize all distinct cell types and the gene expression differences that generate cell types during development. To address this knowledge gap, we produced an Atlas using single-cell RNA-sequencing methods to investigate gene expression from the pharyngula to early larval stages in developing zebrafish. Our single-cell transcriptome Atlas encompasses transcriptional profiles from 44,102 cells across four days of development using duplicate experiments that confirmed high reproducibility. We annotated 220 identified clusters and highlighted several strategies for interrogating changes in gene expression associated with the development of zebrafish embryos at single-cell resolution. Furthermore, we highlight the power of this analysis to assign new cell-type or developmental stage-specific expression information to many genes, including those that are currently known only by sequence andor that lack expression information altogether. The resulting Atlas is a resource of biologists to generate hypotheses for genetic (mutant) or functional analysis, to launch an effort to define the diversity of cell-types during zebrafish organogenesis, and to examine the transcriptional profiles that produce each cell type over developmental time.
biorxiv developmental-biology 100-200-users 2019Expression profiling of the mature C. elegans nervous system by single-cell RNA-Sequencing, bioRxiv, 2019-08-17
AbstractA single neuron and its synapses define the fundamental structural motif of the brain but the underlying gene expression programs that specify individual neuron types are poorly understood. To address this question in a model organism, we have produced a gene expression profile of >90% of the individual neuron classes in the C. elegans nervous system, an ensemble of neurons for which both the anatomy and connectivity are uniquely defined at single cell resolution. We generated single cell transcriptomes for 52,412 neurons that resolve as clusters corresponding to 109 of the canonical 118 neuron classes in the mature hermaphrodite nervous system. Detailed analysis revealed molecular signatures that further subdivide identified classes into specific neuronal subtypes. Notably, neuropeptide-related genes are often differentially expressed between subtypes of the given neuron class which points to distinct functional characteristics. All of these data are publicly available at our website (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpwww.cengen.org>httpwww.cengen.org<jatsext-link>) and can be interrogated at the web application SCeNGEA (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpscengen.shinyapps.ioSCeNGEA>httpscengen.shinyapps.ioSCeNGEA<jatsext-link>). We expect that this gene expression catalog will spur the goal of delineating the underlying mechanisms that define the developmental lineage, detailed anatomy, synaptic connectivity and function of each type of C. elegans neuron.
biorxiv neuroscience 100-200-users 2019MINFLUX nanoscopy delivers multicolor nanometer 3D-resolution in (living) cells, bioRxiv, 2019-08-14
The ultimate goal of biological superresolution fluorescence microscopy is to provide three-dimensional resolution at the size scale of a fluorescent marker. Here, we show that, by localizing individual switchable fluorophores with a probing doughnut-shaped excitation beam, MINFLUX nanoscopy provides 1–3 nanometer resolution in fixed and living cells. This progress has been facilitated by approaching each fluorophore iteratively with the probing doughnut minimum, making the resolution essentially uniform and isotropic over scalable fields of view. MINFLUX imaging of nuclear pore complexes of a mammalian cell shows that this true nanometer scale resolution is obtained in three dimensions and in two color channels. Relying on fewer detected photons than popular camera-based localization, MINFLUX nanoscopy is poised to open a new chapter in the imaging of protein complexes and distributions in fixed and living cells.
biorxiv biophysics 100-200-users 2019