biorxiv | audiences

Machine learning-guided channelrhodopsin engineering enables minimally-invasive optogenetics, bioRxiv, 2019-03-04

We have engineered light-gated channelrhodopsins (ChRs) whose current strength and light sensitivity enable minimally-invasive neuronal circuit interrogation. Current ChR tools applied to the mammalian brain require intracranial surgery for transgene delivery and implantation of invasive fiber-optic cables to produce light-dependent activation of a small volume of brain tissue [~1 mm3]. To enable optogenetics for large brain volumes and without the need for invasive implants, our ChR engineering approach leverages the significant literature of ChR variants to train statistical models for the design of new, high-performance ChRs. With Gaussian Process models trained on a limited experimental set of 102 functionally characterized ChR variants, we designed high-photocurrent ChRs with unprecedented light sensitivity; three of these, ChRger1, ChRger2, and ChRger3, enable optogenetic activation of the nervous system via minimally-invasive systemic transgene delivery with rAAV-PHP.eB, which was not possible previously due to low per-cell transgene copy produced by systemic delivery. These engineered ChRs enable light-induced neuronal excitation without invasive intracranial surgery for virus delivery or fiber optic implantation, i.e. they enable minimally-invasive optogenetics.

biorxiv bioengineering 200-500-users 2019

A lineage-resolved molecular atlas of C. elegans embryogenesis at single cell resolution, bioRxiv, 2019-03-02

AbstractC. elegans is an animal with few cells, but a striking diversity of cell types. Here, we characterize the molecular basis for their specification by profiling the transcriptomes of 84,625 single embryonic cells. We identify 284 terminal and pre-terminal cell types, mapping most single cell transcriptomes to their exact position in C. elegans’ invariant lineage. We use these annotations to perform the first quantitative analysis of the relationship between lineage and the transcriptome for a whole organism. We find that a strong lineage-transcriptome correlation in the early embryo breaks down in the final two cell divisions as cells adopt their terminal fates and that most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state. Users can explore our data with a graphical application “VisCello”.

biorxiv genomics 100-200-users 2019

Compositional Data Analysis is necessary for simulating and analyzing RNA-Seq data, bioRxiv, 2019-03-02

Seq techniques (e.g. RNA-Seq) generate compositional datasets, i.e. the number of fragments sequenced is not proportional to the total RNA present. Thus, datasets carry only relative information, even though absolute RNA copy numbers are often of interest. Current normalization methods assume most features are not changing, which can lead to misleading conclusions when there are large shifts. However, there are few real datasets and no simulation protocols currently available that can directly benchmark methods when such large shifts occur.We present absSimSeq, an R package that simulates compositional data in the form of RNA-Seq reads. We tested several tools used for RNA-Seq differential analysis sleuth, DESeq2, edgeR, limma, sleuth and ALDEx2 (which explicitly takes a compositional approach). For these tools, we compared their standard normalization to either “compositional normalization”, which uses log-ratios to anchor the data on a set of negative control features, or RUVSeq, another tool that directly uses negative control features.We show that common normalizations result in reduced performance with current methods when there is a large change in the total RNA per cell. Performance improves when spike-ins are included and used by a compositional approach, even if the spike-ins have substantial variation. In contrast, RUVSeq, which normalizes count data rather than compositional data, has poor performance. Further, we show that previous criticisms of spike-ins did not take into account the compositional nature of the data. We conclude that absSimSeq can generate more representative datasets for testing performance, and that spike-ins should be more broadly used in a compositional manner to minimize misleading conclusions from differential analyses.

biorxiv bioinformatics 0-100-users 2019

Genetic inhibition of PCSK9, atherogenic lipoprotein concentrations, and calcific aortic valve stenosis, bioRxiv, 2019-03-02

Background Proprotein convertase subtilisinkexin type 9 (PCSK9) inhibition reduces plasma concentrations of low-density lipoprotein cholesterol (LDL-C), apolipoprotein B (apoB) and lipoprotein(a) [Lp(a)]. Atherogenic lipoprotein levels have been linked with calcific aortic valve stenosis (CAVS). Our objectives were to determine the association between variants in PCSK9 and lipoprotein-lipid levels, coronary artery disease (CAD) and CAVS, and to evaluate if PCSK9 could be implicated in aortic valve interstitial cells (VICs) calcification.Methods We built a genetic risk score weight for LDL-C levels (wGRS) using 10 independent PCSK9 single nucleotide polymorphisms and determined its association with lipoprotein-lipid levels in 9692 participants of the EPIC-Norfolk study. We investigated the association between the wGRS and CAD and CAVS in the UK Biobank, as well as the association between the PCSK9 R46L variant and CAVS in a meta-analysis of published prospective, population-based studies (Copenhagen studies, 1463 cases101,620 controls) and unpublished studies (UK Biobank, 1350 cases349,043 controls, Malmo Diet and Cancer study, 682 cases5963 controls and EPIC-Norfolk study, 508 cases20,421 controls). We evaluated PCSK9 expression and localization in explanted aortic valves by capillary Western blot and immunohistochemistry in patients with and without CAVS. Von Kossa staining was used to visualize aortic leaflet calcium deposits. PCSK9 expression under oxidative stress conditions in VICs was assessed.Results The wGRS was significantly associated with lower LDL-C and apoB (p<0.001), but not with Lp(a). In the UK Biobank, the association of PCSK9 variants with CAD were positively correlated with their effects on apoB levels. CAVS was less prevalent in carriers of the PCSK9 R46L variant [odds ratio=0.71 (95% confidence interval, 0.57-0.88), p<0.001]. PCSK9 expression was elevated in the aortic valves of patients with aortic sclerosis and CAVS compared to controls. In calcified leaflets, PCSK9 co-localized with calcium deposits. PCSK9 expression was induced by oxidative stress in VICs. Conclusion Genetic inhibition of PCSK9 is associated with lifelong reductions in the levels of non-Lp(a) apoB-containing lipoproteins as well as lower odds of CAD and CAVS. PCSK9 is abundant in fibrotic and calcified aortic leaflets. Oxidative stress increases PCSK9 expression in VICs. These results provide a rationale for performing randomized clinical trials of PCSK9 inhibition in CAVS.

biorxiv genetics 0-100-users 2019

Mash Screen High-throughput sequence containment estimation for genome discovery, bioRxiv, 2019-03-02

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome, and demonstrate the identification of a novel polyomavirus species from a public metagenome.

biorxiv bioinformatics 100-200-users 2019

A mechanism to minimize errors during non-homologous end joining, bioRxiv, 2019-03-01

SUMMARYEnzymatic processing of DNA underlies all DNA repair, yet inappropriate DNA processing must be avoided. In vertebrates, double-strand breaks are repaired predominantly by non-homologous end-joining (NHEJ), which directly ligates DNA ends. NHEJ has the potential to be highly mutagenic because it employs DNA polymerases, nucleases, and other enzymes that modify incompatible DNA ends to allow their ligation. Using a biochemical system that recapitulates key features of cellular NHEJ, we show that end-processing requires formation of a “short-range synaptic complex” in which DNA ends are closely aligned in a ligation-competent state. Furthermore, single-molecule imaging directly demonstrates that processing occurs within the short-range complex. This confinement of end processing to a ligation-competent complex ensures that DNA ends undergo ligation as soon as they become compatible, thereby minimizing mutagenesis. Our results illustrate how the coordination of enzymatic catalysis with higher-order structural organization of substrate maximizes the fidelity of DNA repair.

biorxiv molecular-biology 0-100-users 2019