Loss-of-function tolerance of enhancers in the human genome, bioRxiv, 2019-04-14
AbstractPrevious studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that each individual human genome possesses at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers are more tissue-specific and regulate fewer and more dispensable genes. They are enriched in immune-related cells while LoF-intolerant enhancers are enriched in kidney and brainneuronal stem cells. We developed a supervised learning approach to predict the LoF-tolerance of enhancers, which achieved an AUROC of 96%. We predict 5,677 more enhancers would be likely tolerant to LoF and 75 enhancers that would be highly LoF-intolerant. Our predictions are supported by known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.
biorxiv genomics 0-100-users 2019A resource-efficient tool for mixed model association analysis of large-scale data, bioRxiv, 2019-04-12
ABSTRACTThe genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test-statistics and thereby spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we developed an MLM-based tool (called fastGWA) that controls for population stratification by principal components and relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrated by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then applied fastGWA to 2,173 traits on 456,422 array-genotyped and imputed individuals and 2,048 traits on 46,191 whole-exome-sequenced individuals in the UKB.
biorxiv genetics 0-100-users 2019Automated Reconstruction of a Serial-Section EM Drosophila Brain with Flood-Filling Networks and Local Realignment, bioRxiv, 2019-04-12
AbstractReconstruction of neural circuitry at single-synapse resolution is an attractive target for improving understanding of the nervous system in health and disease. Serial section transmission electron microscopy (ssTEM) is among the most prolific imaging methods employed in pursuit of such reconstructions. We demonstrate how Flood-Filling Networks (FFNs) can be used to computationally segment a forty-teravoxel whole-brain Drosophila ssTEM volume. To compensate for data irregularities and imperfect global alignment, FFNs were combined with procedures that locally re-align serial sections and dynamically adjust image content. The proposed approach produced a largely merger-free segmentation of the entire ssTEM Drosophila brain, which we make freely available. As compared to manual tracing using an efficient skeletonization strategy, the segmentation enabled circuit reconstruction and analysis workflows that were an order of magnitude faster.
biorxiv neuroscience 100-200-users 2019Exploring dimension-reduced embeddings with Sleepwalk, bioRxiv, 2019-04-12
AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-Seq data for many single cells. However, dimension reduction is unavoidably prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict “true” similarities of all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleepwalk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-Seq but also in any other area with matrix-shaped big data.
biorxiv bioinformatics 100-200-users 2019Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts, bioRxiv, 2019-04-12
AbstractMotivationGenome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks (CNNs) have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types.ResultsWe introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis and trans regulation of chromatin dynamics across 123 diverse cellular contexts.AvailabilityThe code is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comkundajelabChromDragoNN>httpsgithub.comkundajelabChromDragoNN<jatsext-link>Contactakundaje@stanford.edu
biorxiv genomics 100-200-users 2019Light-sheet microscopy with isotropic, sub-micron resolution and solvent-independent large-scale imaging, bioRxiv, 2019-04-12
AbstractWe present cleared tissue Axially Swept Light-Sheet Microscopy (ctASLM), which achieves sub-micron isotropic resolution, high optical sectioning capability, and large field of view imaging (870×870 μm2) over a broad range of immersion media. ctASLM can image live, expanded, and both aqueous and organic chemically cleared tissue preparations and provides 2- to 5-fold better axial resolution than confocal or other reported cleared tissue light-sheet microscopes. We image millimeter-sized tissues with sub-micron 3D resolution, which enabled us to perform automated detection of cells and subcellular features such as dendritic spines.
biorxiv bioengineering 0-100-users 2019