Predictive Coding of Novel versus Familiar Stimuli in the Primary Visual Cortex, bioRxiv, 2017-10-04
AbstractTo explore theories of predictive coding, we presented mice with repeated sequences of images with novel images sparsely substituted. Under these conditions, mice could be rapidly trained to lick in response to a novel image, demonstrating a high level of performance on the first day of testing. Using 2-photon calcium imaging to record from layer 23 neurons in the primary visual cortex, we found that novel images evoked excess activity in the majority of neurons. When a new stimulus sequence was repeatedly presented, a majority of neurons had similarly elevated activity for the first few presentations, which then decayed to almost zero activity. The decay time of these transient responses was not fixed, but instead scaled with the length of the stimulus sequence. However, at the same time, we also found a small fraction of the neurons within the population (∼2%) that continued to respond strongly and periodically to the repeated stimulus. Decoding analysis demonstrated that both the transient and sustained responses encoded information about stimulus identity. We conclude that the layer 23 population uses a two-channel predictive code a dense transient code for novel stimuli and a sparse sustained code for familiar stimuli. These results extend and unify existing theories about the nature of predictive neural codes.
biorxiv neuroscience 0-100-users 2017Directed evolution of TurboID for efficient proximity labeling in living cells and organisms, bioRxiv, 2017-10-03
AbstractProtein interaction networks and protein compartmentation underlie every signaling process and regulatory mechanism in cells. Recently, proximity labeling (PL) has emerged as a new approach to study the spatial and interaction characteristics of proteins in living cells. However, the two enzymes commonly used for PL come with tradeoffs – BioID is slow, requiring tagging times of 18-24 hours, while APEX peroxidase uses substrates that have limited cell permeability and high toxicity. To address these problems, we used yeast display-based directed evolution to engineer two mutants of biotin ligase, TurboID and miniTurbo, with much greater catalytic efficiency than BioID, and the ability to carry out PL in cells in much shorter time windows (as little as 10 minutes) with non-toxic and easily deliverable biotin. In addition to shortening PL time by 100-fold and increasing PL yield in cell culture, TurboID enabled biotin-based PL in new settings, including yeast, Drosophila, and C. elegans.
biorxiv bioengineering 0-100-users 2017Fast and Accurate Genomic Analyses using Genome Graphs, bioRxiv, 2017-09-28
AbstractThe human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, which impairs read alignment and downstream analysis accuracy. Reference genome structures incorporating known genetic variation have been shown to improve the accuracy of genomic analyses, but have so far remained computationally prohibitive for routine large-scale use. Here we present a graph genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million indels. Our Graph Genome Pipeline requires 6.5 hours to process a 30x coverage WGS sample on a system with 36 CPU cores compared with 11 hours required by the GATK Best Practices pipeline. Using complementary benchmarking experiments based on real and simulated data, we show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, or about 20,000 additional variants being detected per sample, while variant calling specificity is unaffected. Structural variations (SVs) incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is a significant advance towards fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.
biorxiv bioinformatics 100-200-users 2017High-resolution genome-wide functional dissection of transcriptional regulatory regions in human, bioRxiv, 2017-09-28
AbstractGenome-wide epigenomic maps revealed millions of regions showing signatures of enhancers, promoters, and other gene-regulatory elements1. However, high-throughput experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited in their scale and length of regions tested. Here, we present a new method, HiDRA (High-Definition Reporter Assay), that overcomes these limitations by combining components of Sharpr-MPRA2 and STARR-Seq3 with genome-wide selection of accessible regions from ATAC-Seq4. We used HiDRA to test ~7 million DNA fragments preferentially selected from accessible chromatin in the GM12878 lymphoblastoid cell line. By design, accessibility-selected fragments were highly overlapping (up to 370 per region), enabling us to pinpoint driver regulatory nucleotides by exploiting subtle differences in reporter activity between partially-overlapping fragments, using a new machine learning model SHARPR2. Our resulting maps include ~65,000 regions showing significant enhancer function and enriched for endogenous active histone marks (including H3K9ac, H3K27ac), regulatory sequence motifs, and regions bound by immune regulators. Within them, we discover ~13,000 high-resolution driver elements enriched for regulatory motifs and evolutionarily-conservednucleotides, and help predict causal genetic variants underlying disease from genome-wide association studies. Overall, HiDRA provides a general, scalable, high-throughput, and high-resolution approach for experimental dissection of regulatory regions and driver nucleotides in the context of human biology and disease.
biorxiv genomics 200-500-users 2017Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming, bioRxiv, 2017-09-28
AbstractUnderstanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neuraland placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.
biorxiv bioinformatics 200-500-users 2017CUT&RUN Targeted in situ genome-wide profiling with high efficiency for low cell numbers, bioRxiv, 2017-09-25
SUMMARYCleavage Under Targets and Release Using Nuclease (CUT&RUN) is an epigenomic profiling strategy in which antibody-targeted controlled cleavage by micrococcal nuclease releases specific protein-DNA complexes into the supernatant for paired-end DNA sequencing. As only the targeted fragments enter into solution, and the vast majority of DNA is left behind, CUT&RUN has exceptionally low background levels. CUT&RUN outperforms the most widely-used Chromatin Immunoprecipitation (ChIP) protocols in resolution, signal-to-noise, and depth of sequencing required. In contrast to ChIP, CUT&RUN is free of solubility and DNA accessibility artifacts and can be used to profile insoluble chromatin and to detect long-range 3D contacts without cross-linking. Here we present an improved CUT&RUN protocol that does not require isolation of nuclei and provides high-quality data starting with only 100 cells for a histone modification and 1000 cells for a transcription factor. From cells to purified DNA CUT&RUN requires less than a day at the lab bench.
biorxiv genomics 100-200-users 2017