A comparison of gene expression and DNA methylation patterns across tissues and species, bioRxiv, 2018-12-05
AbstractPreviously published comparative functional genomic data sets from primates using frozen tissue samples, including many data sets from our own group, were collected and analyzed using non-optimal study designs and analysis approaches. In addition, when samples from multiple tissues were studied in a comparative framework, individual and tissue were confounded. We designed a multi-tissue comparative study of gene expression and DNA methylation in primates that minimizes confounding effects by using a balanced design with respect to species, tissues, and individuals. We also developed a comparative analysis pipeline that minimizes biases due to sequence divergence. We thus present the most comprehensive catalog of similarities and differences in gene expression and methylation levels between livers, kidneys, hearts, and lungs, in humans, chimpanzees, and rhesus macaques. We estimate that overall, only between 7 to 11% (depending on the tissue) of inter-species differences in gene expression levels can be accounted for by corresponding differences in promoter DNA methylation. However, gene expression divergence in conserved tissue-specific genes can be explained by corresponding inter-species methylation changes more often. We end the paper by providing recommendations for effective study design and best practices for meta-data recording for comparative functional genomic studies in primates.
biorxiv genomics 0-100-users 2018Direct RNA nanopore sequencing of full-length coron-avirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
ABSTRACTSequence analyses of RNA virus genomes remain challenging due to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses that are generally referred to as ‘quasispecies’. Although standard (short-read) sequencing technologies allow to readily determine consensus sequences for these ‘quasispecies’, it is far more difficult to reconstruct large numbers of full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length (sg) RNAs comprised of noncontiguous genome regions that may be present in these virus populations. Here, we used a full-length, direct RNA sequencing (DRS) approach without any amplification step to characterize viral RNAs produced in cells infected with a human coronavirus representing one of the largest RNA virus genomes known to date.Using DRS, we were able to map the longest (~26 kb) contiguous read to the viral reference genome. By combining Illumina and nanopore sequencing, a highly accurate consensus sequence of the human coronavirus (HCoV) 229E genome (27.3 kb) was reconstructed. Furthermore, using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which does not require reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs. Our work paves the way for haplotype-based analyses of viral quasispecies by demonstrating the feasibility of intra-sample haplotype separation. We also show how supplementary short-read sequencing (Illumina) can be used to reduce the error rate of nanopore sequencing.Even though a number of technical challenges remain to be addressed to fully exploit the potential of the nanopore technology, our work illustrates that direct RNA sequencing may significantly advance genomic studies of complex virus populations, including predictions on long-range interactions in individual full-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
Sequence analyses of RNA virus genomes remain challenging due to the exceptionalgenetic plasticity of these viruses. Because of high mutation and recombinationrates, genome replication by viral RNA-dependent RNA polymerases leads topopulations of closely related viruses, so-called 'quasispecies'. Standard(short-read) sequencing technologies are ill-suited to reconstruct large numbersof full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length(sg) RNAs comprised of noncontiguous genome regions. Here, we used afull-length, direct RNA sequencing (DRS) approach based on nanopores tocharacterize viral RNAs produced in cells infected with a human coronavirus.Using DRS, we were able to map the longest (~26 kb) contiguous read to theviral reference genome. By combining Illumina and nanopore sequencing, wereconstructed a highly accurate consensus sequence of the human coronavirus(HCoV) 229E genome (27.3 kb). Furthermore, using long reads that did notrequire an assembly step, we were able to identify, in infected cells, diverseand novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRSapproach, which circumvents reverse transcription and amplification of RNA,allowed us to detect methylation sites in viral RNAs. Our work paves the way forhaplotype-based analyses of viral quasispecies by demonstrating the feasibilityof intra-sample haplotype separation.Even though several technical challenges remain to be addressed to exploit thepotential of the nanopore technology fully, our work illustrates that direct RNAsequencing may significantly advance genomic studies of complex viruspopulations, including predictions on long-range interactions in individualfull-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Sex differences in gene expression in the human fetal brain, bioRxiv, 2018-11-30
ABSTRACTWidespread structural, chemical and molecular differences have been reported between the male and female human brain. Although several neurodevelopmental disorders are more commonly diagnosed in males, little is known regarding sex differences in early human brain development. Here, we used RNA sequencing data from a large collection of human brain samples from the second trimester of gestation (N = 120) to assess sex biases in gene expression within the human fetal brain. In addition to 43 genes (102 Ensembl transcripts) transcribed from the Y-chromosome in males, we detected sex differences in the expression of 2558 autosomal genes (2723 Ensembl transcripts) and 155 genes on the X-chromosome (207 Ensembl transcripts) at a false discovery rate (FDR) < 0.1. Genes exhibiting sex-biased expression in human fetal brain are enriched for high-confidence risk genes for autism and other developmental disorders. Male-biased genes are enriched for expression in neural progenitor cells, whereas female-biased genes are enriched for expression in Cajal-Retzius cells and glia. All gene- and transcript-level data are provided as an online resource (available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpfgen.psycm.cf.ac.ukFBSeq1>httpfgen.psycm.cf.ac.ukFBSeq1<jatsext-link>) through which researchers can search, download and visualize data pertaining to sex biases in gene expression during early human brain development.
biorxiv genomics 200-500-users 2018Base editing generates substantial off-target single nucleotide variants, bioRxiv, 2018-11-27
AbstractGenome editing tools including CRISPRCas9 and base editors hold great promise for correcting pathogenic mutations. Unbiased genome-wide off-target effects of the editing in mammalian cells is required before clinical applications, but determination of the extent of off-target effects has been difficult due to the existence of single nucleotide polymorphisms (SNPs) in individuals. Here, we developed a method named GOTI (Genome-wide Off-target analysis by Two-cell embryo Injection) to detect off-target mutations without interference of SNPs. We applied GOTI to both the CRISPR-Cas9 and base editing (BE3) systems by editing one blastomere of the two-cell mouse embryo and then compared whole genome sequences of progeny-cell populations at E14.5 stage. Sequence analysis of edited and non-edited cell progenies showed that undesired off-target single nucleotide variants (SNVs) are rare (average 10.5) in CRISPR-edited mouse embryos, with a frequency close to the spontaneous mutation rate. By contrast, BE3 editing induced over 20-fold higher SNVs (average 283), raising the concern of using base-editing approaches for biomedical application.
biorxiv genomics 100-200-users 2018Nuclei multiplexing with barcoded antibodies for single-nucleus genomics, bioRxiv, 2018-11-23
AbstractSingle-nucleus RNA-Seq (snRNA-seq) enables the interrogation of cellular states in complex tissues that are challenging to dissociate, including frozen clinical samples. This opens the way, in principle, to large studies, such as those required for human genetics, clinical trials, or precise cell atlases of large organs. However, such applications are currently limited by batch effects, sequential processing, and costs. To address these challenges, we present an approach for multiplexing snRNA-seq, using sample-barcoded antibodies against the nuclear pore complex to uniquely label nuclei from distinct samples. Comparing human brain cortex samples profiled in multiplex with or without hashing antibodies, we demonstrate that nucleus hashing does not significantly alter the recovered transcriptome profiles. We further developed demuxEM, a novel computational tool that robustly detects inter-sample nucleus multiplets and assigns singlets to their samples of origin by antibody barcodes, and validated its accuracy using gender-specific gene expression, species-mixing and natural genetic variation. Nucleus hashing significantly reduces cost per nucleus, recovering up to about 5 times as many single nuclei per microfluidc channel. Our approach provides a robust technique for diverse studies including tissue atlases of isogenic model organisms or from a single larger human organ, multiple biopsies or longitudinal samples of one donor, and large-scale perturbation screens.
biorxiv genomics 0-100-users 2018