Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing, bioRxiv, 2018-03-30

AbstractTranscriptome profiles of individual cells reflect true and often unexplored biological diversity, but are also affected by noise of biological and technical nature. This raises the need to explicitly model the resulting uncertainty and take it into account in any downstream analysis, such as dimensionality reduction, clustering, and differential expression. Here, we introduce Single-cell Variational Inference (scVI), a scalable framework for probabilistic representation and analysis of gene expression in single cells. Our model uses variational inference and stochastic optimization of deep neural networks to approximate the parameters that govern the distribution of expression values of each gene in every cell, using a non-linear mapping between the observations and a low-dimensional latent space.By doing so, scVI pools information between similar cells or genes while taking nuisance factors of variation such as batch effects and limited sensitivity into account. To evaluate scVI, we conducted a comprehensive comparative analysis to existing methods for distributional modeling and dimensionality reduction, all of which rely on generalized linear models. We first show that scVI scales to over one million cells, whereas competing algorithms can process at most tens of thousands of cells. Next, we show that scVI fits unseen data more closely and can impute missing data more accurately, both indicative of a better generalization capacity. We then utilize scVI to conduct a set of fundamental analysis tasks – including batch correction, visualization, clustering and differential expression – and demonstrate its accuracy in comparison to the state-of-the-art tools in each task. scVI is publicly available, and can be readily used as a principled and inclusive solution for multiple tasks of single-cell RNA sequencing data analysis.

biorxiv bioinformatics 0-100-users 2018

A comprehensive toolkit to enable MinION sequencing in any laboratory, bioRxiv, 2018-03-27

AbstractLong-read sequencing technologies are transforming our ability to assemble highly complex genomes. Realising their full potential relies crucially on extracting high quality, high molecular weight (HMW) DNA from the organisms of interest. This is especially the case for the portable MinION sequencer which potentiates all laboratories to undertake their own genome sequencing projects, due to its low entry cost and minimal spatial footprint. One challenge of the MinION is that each group has to independently establish effective protocols for using the instrument, which can be time consuming and costly. Here we present a workflow and protocols that enabled us to establish MinION sequencing in our own laboratories, based on optimising DNA extractions from a challenging plant tissue as a case study. Following the workflow illustrated we were able to reliably and repeatedly obtain > 8.5 Gb of long read sequencing data with a mean read length of 13 kb and an N50 of 26 kb. Our protocols are open-source and can be performed in any laboratory without special equipment. We also illustrate some more elaborate workflows which can increase mean and average read lengths if this is desired. We envision that our workflow for establishing MinION sequencing, including the illustration of potential pitfalls, will be useful to others who plan to establish long-read sequencing in their own laboratories.

biorxiv genomics 500+-users 2018

Large-scale neuroimaging and genetic study reveals genetic architecture of brain white matter microstructure, bioRxiv, 2018-03-26

AbstractMicrostructural changes of white matter (WM) tracts are known to be associated with various neuropsychiatric disordersdiseases. Heritability of structural changes of WM tracts has been examined using diffusion tensor imaging (DTI) in family-based studies for different age groups. The availability of genetic and DTI data from recent large population-based studies offers opportunity to further improve our understanding of genetic contributions. Here, we analyzed the genetic architecture of WM tracts using DTI and single-nucleotide polymorphism (SNP) data of unrelated individuals in the UK Biobank (n ∼ 8000). The DTI parameters were generated using the ENIGMA-DTI pipeline. We found that DTI parameters are substantially heritable on most WM tracts. We observed a highly polygenic or omnigenic architecture of genetic influence across the genome as well as the enrichment of SNPs in active chromatin regions. Our bivariate analyses showed strong genetic correlations for several pairs of WM tracts as well as pairs of DTI parameters. We performed voxel-based analysis to illustrate the pattern of genetic effects on selected parts of the tract-based spatial statistics skeleton. Comparing the estimates from the UK Biobank to those from small population-based studies, we illustrated that sufficiently large sample size is essential for genetic architecture discovery in imaging genetics. We confirmed this finding with a simulation study.

biorxiv genetics 100-200-users 2018

Marionette E. coli containing 12 highly-optimized small molecule sensors, bioRxiv, 2018-03-21

Cellular processes are carried out by many interacting genes and their study and optimization requires multiple levers by which they can be independently controlled. The most common method is via a genetically-encoded sensor that responds to a small molecule (an “inducible system”). However, these sensors are often suboptimal, exhibiting high background expression and low dynamic range. Further, using multiple sensors in one cell is limited by cross-talk and the taxing of cellular resources. Here, we have developed a directed evolution strategy to simultaneously select for less background, high dynamic range, increased sensitivity, and low crosstalk. Libraries of the regulatory protein and output promoter are built based on random and rationally-guided mutations. This is applied to generate a set of 12 high-performance sensors, which exhibit >100-fold induction with low background and cross-reactivity. These are combined to build a single “sensor array” and inserted into the genomes of E. coli MG1655 (wild-type), DH10B (cloning), and BL21 (protein expression). These “Marionette” strains allow for the independent control of gene expression using 2,4-diacetylphophloroglucinol (DAPG), cuminic acid (Cuma), 3-oxohexanoyl-homoserine lactone (OC6), vanillic acid (Van), isopropyl β-D-1-thiogalactopyranoside (IPTG), anhydrotetracycline (aTc), L-arabinose (Ara), choline chloride (Cho), naringenin (Nar), 3,4-dihydroxybenzoic acid (DHBA), sodium salicylate (Sal), and 3-hydroxytetradecanoyl-homoserine lactone (OHC14).

biorxiv synthetic-biology 0-100-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo