Exploring dimension-reduced embeddings with Sleepwalk, bioRxiv, 2019-04-12
AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-Seq data for many single cells. However, dimension reduction is unavoidably prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict “true” similarities of all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleepwalk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-Seq but also in any other area with matrix-shaped big data.
biorxiv bioinformatics 100-200-users 2019Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts, bioRxiv, 2019-04-12
AbstractMotivationGenome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks (CNNs) have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types.ResultsWe introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis and trans regulation of chromatin dynamics across 123 diverse cellular contexts.AvailabilityThe code is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comkundajelabChromDragoNN>httpsgithub.comkundajelabChromDragoNN<jatsext-link>Contactakundaje@stanford.edu
biorxiv genomics 100-200-users 2019Light-sheet microscopy with isotropic, sub-micron resolution and solvent-independent large-scale imaging, bioRxiv, 2019-04-12
AbstractWe present cleared tissue Axially Swept Light-Sheet Microscopy (ctASLM), which achieves sub-micron isotropic resolution, high optical sectioning capability, and large field of view imaging (870×870 μm2) over a broad range of immersion media. ctASLM can image live, expanded, and both aqueous and organic chemically cleared tissue preparations and provides 2- to 5-fold better axial resolution than confocal or other reported cleared tissue light-sheet microscopes. We image millimeter-sized tissues with sub-micron 3D resolution, which enabled us to perform automated detection of cells and subcellular features such as dendritic spines.
biorxiv bioengineering 0-100-users 2019Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations, bioRxiv, 2019-04-12
AbstractNanopore sequencing technology can rapidly and directly interrogate native DNA molecules. Often we are interested only in interrogating specific areas at high depth, but conventional enrichment methods have thus far proved unsuitable for long reads1. Existing strategies are currently limited by high input DNA requirements, low yield, short (<5kb) reads, time-intensive protocols, andor amplification or cloning (losing base modification information). In this paper, we describe a technique utilizing the ability of Cas9 to introduce cuts at specific locations and ligating nanopore sequencing adaptors directly to those sites, a method we term ‘nanopore Cas9 Targeted-Sequencing’ (nCATS).We have demonstrated this using an Oxford Nanopore MinION flow cell (Capacity >10Gb+) to generate a median 165X coverage at 10 genomic loci with a median length of 18kb, representing a several hundred-fold improvement over the 2-3X coverage achieved without enrichment. We performed a pilot run on the smaller Flongle flow cell (Capacity ~1Gb), generating a median coverage of 30X at 11 genomic loci with a median length of 18kb. Using panels of guide RNAs, we show that the high coverage data from this method enables us to (1) profile DNA methylation patterns at cancer driver genes, (2) detect structural variations at known hot spots, and (3) survey for the presence of single nucleotide mutations. Together, this provides a low-cost method that can be applied even in low resource settings to directly examine cellular DNA. This technique has extensive clinical applications for assessing medically relevant genes and has the versatility to be a rapid and comprehensive diagnostic tool. We demonstrate applications of this technique by examining the well-characterized GM12878 cell line as well as three breast cell lines (MCF-10A, MCF-7, MDA-MB-231) with varying tumorigenic potential as a model for cancer.ContributionsTG and WT constructed the study. TG performed the experiments. TG, IL, and FS analyzed the data. TG, JG, ER, RB and AH and developed the method. TG and WT wrote the paper
biorxiv genomics 200-500-users 2019A reference map of the human protein interactome, bioRxiv, 2019-04-11
AbstractGlobal insights into cellular organization and function require comprehensive understanding of interactome networks. Similar to how a reference genome sequence revolutionized human genetics, a reference map of the human interactome network is critical to fully understand genotype-phenotype relationships. Here we present the first human “all-by-all” binary reference interactome map, or “HuRI”. With ~53,000 high-quality protein-protein interactions (PPIs), HuRI is approximately four times larger than the information curated from small-scale studies available in the literature. Integrating HuRI with genome, transcriptome and proteome data enables the study of cellular function within essentially any physiological or pathological cellular context. We demonstrate the use of HuRI in identifying specific subcellular roles of PPIs and protein function modulation via splicing during brain development. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms underlying tissue-specific phenotypes of Mendelian diseases. HuRI thus represents an unprecedented, systematic reference linking genomic variation to phenotypic outcomes.
biorxiv systems-biology 200-500-users 2019Antibiotics select for novel pathways of resistance in biofilms, bioRxiv, 2019-04-11
AbstractMost bacteria in nature exist in aggregated communities known as biofilms. Bacteria within biofilms are inherently highly resistant to antibiotics. Current understanding of the evolution and mechanisms of antibiotic resistance is largely derived from work from cells in liquid culture and it is unclear whether biofilms adapt and evolve in response to sub-inhibitory concentrations of drugs. Here we used a biofilm evolution model to show that biofilms of a model food borne pathogen, Salmonella Typhimurium rapidly evolve in response to exposure to three clinically important antibiotics. Whilst the model strongly selected for improved biofilm formation in the absence of any drug, once antibiotics were introduced the need to adapt to the drug was more important than the selection for improved biofilm formation. Adaptation to antibiotic stress imposed a marked cost in biofilm formation, particularly evident for populations exposed to cefotaxime and azithromycin. We identified distinct resistance phenotypes in biofilms compared to corresponding planktonic control cultures and characterised new mechanisms of resistance to cefotaxime and azithromycin. Novel substitutions within the multidrug efflux transporter, AcrB were identified and validated as impacting drug export as well as changes in regulators of this efflux system. There were clear fitness costs identified and associated with different evolutionary trajectories. Our results demonstrate that biofilms adapt rapidly to low concentrations of antibiotics and the mechanisms of adaptation are novel. This work will be a starting point for studies to further examine biofilm specific pathways of adaptation which inform future antibiotic use.
biorxiv microbiology 0-100-users 2019