Systematic comparative analysis of single cell RNA-sequencing methods, bioRxiv, 2019-05-10
ABSTRACTA multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell andor single nucleus profiling from three types of samples – cell lines, peripheral blood mononuclear cells and brain tissue – generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.
biorxiv genomics 100-200-users 2019Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression, bioRxiv, 2019-05-08
AbstractRecent developments in stem cell biology have enabled the study of cell fate decisions in early human development that are impossible to study in vivo. However, understanding how development varies across individuals and, in particular, the influence of common genetic variants during this process has not been characterised. Here, we exploit human iPS cell lines from 125 donors, a pooled experimental design, and single-cell RNA-sequencing to study population variation of endoderm differentiation. We identify molecular markers that are predictive of differentiation efficiency, and utilise heterogeneity in the genetic background across individuals to map hundreds of expression quantitative trait loci that influence expression dynamically during differentiation and across cellular contexts.
biorxiv genomics 100-200-users 2019Insights about variation in meiosis from 31,228 human sperm genomes, bioRxiv, 2019-05-02
AbstractMeiosis, while critical for reproduction, is also highly variable and error prone crossover rates vary among humans and individual gametes, and chromosome nondisjunction leads to aneuploidy, a leading cause of miscarriage. To study variation in meiotic outcomes within and across individuals, we developed a way to sequence many individual sperm genomes at once. We used this method to sequence the genomes of 31,228 gametes from 20 sperm donors, identifying 813,122 crossovers, 787 aneuploid chromosomes, and unexpected genomic anomalies. Different sperm donors varied four-fold in the frequency of aneuploid sperm, and aneuploid chromosomes gained in meiosis I had 36% fewer crossovers than corresponding non-aneuploid chromosomes. Diverse recombination phenotypes were surprisingly coordinated donors with high average crossover rates also made a larger fraction of their crossovers in centromere-proximal regions and placed their crossovers closer together. These same relationships were also evident in the variation among individual gametes from the same donor sperm with more crossovers tended to have made crossovers closer together and in centromere-proximal regions. Variation in the physical compaction of chromosomes could help explain this coordination of meiotic variation across chromosomes, gametes, and individuals.
biorxiv genomics 100-200-users 2019CTCF-dependent chromatin boundaries formed by asymmetric nucleosome arrays with decreased linker length, bioRxiv, 2019-04-26
AbstractThe CCCTC-binding factor (CTCF) organises the genome in 3D through DNA loops and in 1D by setting boundaries isolating different chromatin states, but these processes are not well understood. Here we focus on the relationship between CTCF binding and the decrease of the Nucleosome Repeat Length (NRL) for ∼20 adjacent nucleosomes, affecting up to 10% of the mouse genome. We found that the chromatin boundary near CTCF is created by the nucleosome-depleted region (NDR) asymmetrically located >40 nucleotides 5’-upstream from the centre of CTCF motif. The strength of CTCF binding to DNA is correlated with the decrease of NRL near CTCF and anti-correlated with the level of asymmetry of the nucleosome array. Individual chromatin remodellers have different contributions, with Snf2h having the strongest effect on the NRL decrease near CTCF and Chd4 playing a major role in the symmetry breaking. Upon differentiation of embryonic stem cells to neural progenitor cells and embryonic fibroblasts, a subset of common CTCF sites preserved in all three cell types maintains a relatively small local NRL despite genome-wide NRL increase. The sites which lost CTCF upon differentiation are characterised by nucleosome rearrangement 3’-downstream, but the boundary defined by the NDR 5’-upstream of CTCF motif remains.
biorxiv genomics 0-100-users 2019Animal, fungi, and plant genome sequences harbour different non-canonical splice sites, bioRxiv, 2019-04-23
AbstractMost protein encoding genes in eukaryotes contain introns which are inter-woven with exons. After transcription, introns need to be removed in order to generate the final mRNA which can be translated into an amino acid sequence by the ribosome. Precise excision of introns by the spliceosome requires conserved dinucleotides which mark the splice sites. However, there are variations of the highly conserved combination of GT at the 5’ end and AG at the 3’ end of an intron in the genome. GC-AG and AT-AC are two major non-canonical splice site combinations which are known for many years. During the last few years, various minor non-canonical splice site combinations were detected with all possible dinucleotide permutations. Here we expand systematic investigations of non-canonical splice site combinations in plant genomes to all eukaryotes by analysing fungal and animal genome sequences. Comparisons of splice site combinations between these three kingdoms revealed several differences such as a substantially increased CT-AC frequency in fungal genomes. In addition, high numbers of GA-AG splice site combinations were observed in two animal species. In depth investigation of splice site usage based on RNA-Seq read mappings indicates a generally higher flexibility of the 3’ splice site compared to the 5’ splice site.
biorxiv genomics 0-100-users 2019Stacks 2 Analytical Methods for Paired-end Sequencing Improve RADseq-based Population Genomics, bioRxiv, 2019-04-23
AbstractFor half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes – generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.
biorxiv genomics 100-200-users 2019