Engineering Brain Parasites for Intracellular Delivery of Therapeutic Proteins, bioRxiv, 2018-12-03
Protein therapy has the potential to alleviate many neurological diseases; however, delivery mechanisms for the central nervous system (CNS) are limited, and intracellular delivery poses additional hurdles. To address these challenges, we harnessed the protist parasite Toxoplasma gondii, which can migrate into the CNS and secrete proteins into cells. Using a fusion protein approach, we engineered T. gondii to secrete therapeutic proteins for human neurological disorders. We tested two secretion systems, generated fusion proteins that localized to the secretory organelles of T. gondii and assessed their intracellular targeting in various mammalian cells including neurons. We show that T. gondii expressing GRA16 fused to the Rett syndrome protein MeCP2 deliver a fusion protein that mimics the endogenous MeCP2, binding heterochromatic DNA in neurons. This demonstrates the potential of T. gondii as a therapeutic protein vector, which could provide either transient or chronic, in situ synthesis and delivery of intracellular proteins to the CNS.
biorxiv synthetic-biology 500+-users 2018Direct RNA nanopore sequencing of full-length coron-avirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
ABSTRACTSequence analyses of RNA virus genomes remain challenging due to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses that are generally referred to as ‘quasispecies’. Although standard (short-read) sequencing technologies allow to readily determine consensus sequences for these ‘quasispecies’, it is far more difficult to reconstruct large numbers of full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length (sg) RNAs comprised of noncontiguous genome regions that may be present in these virus populations. Here, we used a full-length, direct RNA sequencing (DRS) approach without any amplification step to characterize viral RNAs produced in cells infected with a human coronavirus representing one of the largest RNA virus genomes known to date.Using DRS, we were able to map the longest (~26 kb) contiguous read to the viral reference genome. By combining Illumina and nanopore sequencing, a highly accurate consensus sequence of the human coronavirus (HCoV) 229E genome (27.3 kb) was reconstructed. Furthermore, using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which does not require reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs. Our work paves the way for haplotype-based analyses of viral quasispecies by demonstrating the feasibility of intra-sample haplotype separation. We also show how supplementary short-read sequencing (Illumina) can be used to reduce the error rate of nanopore sequencing.Even though a number of technical challenges remain to be addressed to fully exploit the potential of the nanopore technology, our work illustrates that direct RNA sequencing may significantly advance genomic studies of complex virus populations, including predictions on long-range interactions in individual full-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis, bioRxiv, 2018-12-01
Sequence analyses of RNA virus genomes remain challenging due to the exceptionalgenetic plasticity of these viruses. Because of high mutation and recombinationrates, genome replication by viral RNA-dependent RNA polymerases leads topopulations of closely related viruses, so-called 'quasispecies'. Standard(short-read) sequencing technologies are ill-suited to reconstruct large numbersof full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length(sg) RNAs comprised of noncontiguous genome regions. Here, we used afull-length, direct RNA sequencing (DRS) approach based on nanopores tocharacterize viral RNAs produced in cells infected with a human coronavirus.Using DRS, we were able to map the longest (~26 kb) contiguous read to theviral reference genome. By combining Illumina and nanopore sequencing, wereconstructed a highly accurate consensus sequence of the human coronavirus(HCoV) 229E genome (27.3 kb). Furthermore, using long reads that did notrequire an assembly step, we were able to identify, in infected cells, diverseand novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRSapproach, which circumvents reverse transcription and amplification of RNA,allowed us to detect methylation sites in viral RNAs. Our work paves the way forhaplotype-based analyses of viral quasispecies by demonstrating the feasibilityof intra-sample haplotype separation.Even though several technical challenges remain to be addressed to exploit thepotential of the nanopore technology fully, our work illustrates that direct RNAsequencing may significantly advance genomic studies of complex viruspopulations, including predictions on long-range interactions in individualfull-length viral RNA haplotypes.
biorxiv genomics 100-200-users 2018Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism, bioRxiv, 2018-12-01
SummaryWe present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n=35,584 total samples, 11,986 with ASD). Using an enhanced Bayesian framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate ≤ 0.1. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained for severe neurodevelopmental delay, while 53 show higher frequencies in individuals ascertained for ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most of the risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In human cortex single-cell gene expression data, expression of risk genes is enriched in both excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatoryinhibitory imbalance underlying ASD.
biorxiv genetics 200-500-users 2018Linked-read sequencing of gametes allows efficient genome-wide analysis of meiotic recombination, bioRxiv, 2018-12-01
ABSTRACTMeiotic crossovers (COs) ensure proper chromosome segregation and redistribute the genetic variation that is transmitted to the next generation. Existing methods for CO identification are challenged by large populations and the demand for genome-wide and fine-scale resolution. Taking advantage of linked-read sequencing, we developed a highly efficient method for genome-wide identification of COs at kilobase resolution in pooled recombinants. We first tested this method using a pool of Arabidopsis F2 recombinants, and obtained results that recapitulated those identified from the same plants using individual whole-genome sequencing. By applying this method to a pool of pollen DNA from a single F1 plant, we established a highly accurate CO landscape without generating or sequencing a single recombinant plant. The simplicity of this approach now enables the simultaneous generation and analysis of multiple CO landscapes and thereby allows for efficient comparison of genotypic and environmental effects on recombination, accelerating the pace at which the mechanisms for the regulation of recombination can be elucidated.
biorxiv bioinformatics 100-200-users 2018Generative modeling and latent space arithmetics predict single-cell perturbation response across cell types, studies and species, bioRxiv, 2018-11-30
AbstractAccurately modeling cellular response to perturbations is a central goal of computational biology. While such modeling has been proposed based on statistical, mechanistic and machine learning models in specific settings, no generalization of predictions to phenomena absent from training data (‘out-of-sample’) has yet been demonstrated. Here, we present scGen, a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. In benchmarks across a broad range of examples, we show that scGen accurately models dose and infection response of cells across cell types, studies and species. In particular, we demonstrate that scGen learns cell type and species specific response implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.
biorxiv bioinformatics 0-100-users 2018