Genomic rearrangements generate hypervariable mini-chromosomes in host-specific lineages of the blast fungus, bioRxiv, 2020-01-12
AbstractSupernumerary mini-chromosomes–a unique type of genomic structural variation–have been implicated in the emergence of virulence traits in plant pathogenic fungi. However, the mechanisms that facilitate the emergence and maintenance of mini-chromosomes across fungi remain poorly understood. In the blast fungus Magnaporthe oryzae, mini-chromosomes have been first described in the early 1990s but, until very recently, have been overlooked in genomic studies. Here we investigated structural variation in four isolates of the blast fungus M. oryzae from different grass hosts and analyzed the sequences of mini-chromosomes in the rice, foxtail millet and goosegrass isolates. The mini-chromosomes of these isolates turned out to be highly diverse with distinct sequence composition. They are enriched in repetitive elements and have lower gene density than core-chromosomes. We identified several virulence-related genes in the mini-chromosome of the rice isolate, including the polyketide synthase Ace1 and the effector gene AVR-Pik. Macrosynteny analyses around these loci revealed structural rearrangements, including inter-chromosomal translocations between core- and mini-chromosomes. Our findings provide evidence that mini-chromosomes independently emerge from structural rearrangements of core-chromosomes and might contribute to adaptive evolution of the blast fungus.Author summaryThe genomes of plant pathogens often exhibit an architecture that facilitates high rates of dynamic rearrangements and genetic diversification in virulence associated regions. These regions, which tend to be gene sparse and repeat rich, are thought to serve as a cradle for adaptive evolution. Supernumerary chromosomes, i.e. chromosomes that are only present in some but not all individuals of a species, are a special type of structural variation that have been observed in plants, animals, and fungi. Here we identified and studied supernumerary mini-chromosomes in the blast fungus Magnaporthe oryzae, a pathogen that causes some of the most destructive plant diseases. We found that rice, foxtail millet and goosegrass isolates of this pathogen contain mini-chromosomes with distinct sequence composition. All mini-chromosomes are rich in repetitive genetic elements and have lower gene densities than core-chromosomes. Further, we identified virulence-related genes on the mini-chromosome of the rice isolate. We observed large-scale genomic rearrangements around these loci, indicative of a role of mini-chromosomes in facilitating genome dynamics. Taken together, our results indicate that mini-chromosomes facilitate genome rearrangements and possibly adaptive evolution of the blast fungus.
biorxiv genomics 100-200-users 2020Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2, bioRxiv, 2020-01-12
AbstractSingle cell transcriptome analysis elucidates facets of cell biology that have been previously out of reach. However, the high-throughput analysis of thousands of single cell transcriptomes has been limited by sample preparation and sequencing technology. High-throughput single cell analysis today is facilitated by protocols like the 10X Genomics platform or Drop-Seq which generate cDNA pools in which the origin of a transcript is encoded at its 5’ or 3’ end. These cDNA pools are currently analyzed by short read Illumina sequencing which can identify the cellular origin of a transcript and what gene it was transcribed from. However, these methods fail to retrieve isoform information. In principle, cDNA pools prepared using these approaches can be analyzed with Pacific Biosciences and Oxford Nanopore long-read sequencers to retrieve isoform information but all current implementations rely heavily on Illumina short-reads for the analysis in addition to long reads. Here, we used R2C2 to sequence and demultiplex 9 million full-length cDNA molecules generated by the 10X Chromium platform from ∼3000 peripheral blood mononuclear cells (PBMCs). We used these reads to – independent from Illumina data – cluster cells into B cells, T cells, and Monocytes and generate isoform-level transcriptomes for these cell-types. We also generated isoform-level transcriptomes for all single cells and used this information to identify a wide range of isoform diversity between genes. Finally, we also designed a computational workflow to extract paired adaptive immune receptor – T cell receptor and B cell receptor (TCR and BCR) –sequences unique to each T and B cell. This work represents a new, simple, and powerful approach that –using a single sequencing method – can extract an unprecedented amount of information from thousands of single cells.
biorxiv genomics 100-200-users 2020Single-cell epigenomic identification of inherited risk loci in Alzheimer’s and Parkinson’s disease, bioRxiv, 2020-01-07
ABSTRACTGenome-wide association studies (GWAS) have identified thousands of variants associated with disease phenotypes. However, the majority of these variants do not alter coding sequences, making it difficult to assign their function. To this end, we present a multi-omic epigenetic atlas of the adult human brain through profiling of the chromatin accessibility landscapes and three-dimensional chromatin interactions of seven brain regions across a cohort of 39 cognitively healthy individuals. Single-cell chromatin accessibility profiling of 70,631 cells from six of these brain regions identifies 24 distinct cell clusters and 359,022 cell type-specific regulatory elements, capturing the regulatory diversity of the adult brain. We develop a machine learning classifier to integrate this multi-omic framework and predict dozens of functional single nucleotide polymorphisms (SNPs), nominating gene and cellular targets for previously orphaned GWAS loci. These predictions both inform well-studied disease-relevant genes, such as BIN1 in microglia for Alzheimer’s disease (AD) and reveal novel gene-disease associations, such as STAB1 in microglia and MAL in oligodendrocytes for Parkinson’s disease (PD). Moreover, we dissect the complex inverted haplotype of the MAPT (encoding tau) PD risk locus, identifying ectopic enhancer-gene contacts in neurons that increase MAPT expression and may mediate this disease association. This work greatly expands our understanding of inherited variation in AD and PD and provides a roadmap for the epigenomic dissection of noncoding regulatory variation in disease.
biorxiv genomics 100-200-users 2020Probabilistic gene expression signatures identify cell-types from single cell RNA-seq data, bioRxiv, 2020-01-06
AbstractSingle-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the classification of cells into known cell-types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type identification methods. We find limitations with current approaches due to the reliance on known marker genes and sensitivity to the quality of reference samples. Here we present a computationally light statistical approach, based on Naive Bayes, that leverages public datasets to combine information across thousands of genes and probabilistically assign cell-type identity. Using datasets ranging across species and tissue types, we demonstrate that our approach is robust to low-quality reference data and produces more accurate cell-type identification than current methods.
biorxiv genomics 0-100-users 2020Twelve Platinum-Standard Reference Genomes Sequences (PSRefSeq) that complete the full range of genetic diversity of Asian rice, bioRxiv, 2020-01-01
AbstractAs the human population grows from 7.8 billion to 10 billion over the next 30 years, breeders must do everything possible to create crops that are highly productive and nutritious, while simultaneously having less of an environmental footprint. Rice will play a critical role in meeting this demand and thus, knowledge of the full repertoire of genetic diversity that exists in germplasm banks across the globe is required. To meet this demand, we describe the generation, validation and preliminary analyses of transposable element and long-range structural variation content of 12 near-gap-free reference genome sequences (RefSeqs) from representatives of 12 of 15 subpopulations of cultivated rice. When combined with 4 existing RefSeqs, that represent the 3 remaining rice subpopulations and the largest admixed population, this collection of 16 Platinum Standard RefSeqs (PSRefSeq) can be used as a pan-genome template to map resequencing data to detect virtually all standing natural variation that exists in the pan-cultivated rice genome.
biorxiv genomics 0-100-users 2020Ultra-high throughput single-cell RNA sequencing by combinatorial fluidic indexing, bioRxiv, 2019-12-19
AbstractCell atlas projects and single-cell CRISPR screens hit the limits of current technology, as they require cost-effective profiling for millions of individual cells. To satisfy these enormous throughput requirements, we developed “single-cell combinatorial fluidic indexing” (scifi) and applied it to single-cell RNA sequencing. The resulting scifi-RNA-seq assay combines one-step combinatorial pre-indexing of single-cell transcriptomes with subsequent single-cell RNA-seq using widely available droplet microfluidics. Pre-indexing allows us to load multiple cells per droplet, which increases the throughput of droplet-based single-cell RNA-seq up to 15-fold, and it provides a straightforward way of multiplexing hundreds of samples in a single scifi-RNA-seq experiment. Compared to multi-round combinatorial indexing, scifi-RNA-seq provides an easier, faster, and more efficient workflow, thereby enabling massive-scale scRNA-seq experiments for a broad range of applications ranging from population genomics to drug screens with scRNA-seq readout. We benchmarked scifi-RNA-seq on various human and mouse cell lines, and we demonstrated its feasibility for human primary material by profiling TCR activation in T cells.
biorxiv genomics 200-500-users 2019