Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-seq, bioRxiv, 2017-12-17
AbstractSystematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods.We have developed scone — a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.
biorxiv genomics 100-200-users 2017Transcription organizes euchromatin similar to an active microemulsion, bioRxiv, 2017-12-16
Chromatin is organized into heterochromatin, which is transcriptionally inactive, and euchromatin, which can switch between transcriptionally active and inactive states. This switch in euchromatin activity is accompanied by changes in its spatial distribution. How euchromatin rearrangements are established is unknown. Here we use super-resolution and live-cell microscopy to show that transcriptionally inactive euchromatin moves away from transcriptionally active euchromatin. This movement is driven by the formation of RNA-enriched microenvironments that exclude inactive euchromatin. Using theory, we show that the segregation into RNA-enriched microenvironments and euchromatin domains can be considered an active microemulsion. The tethering of transcripts to chromatin via RNA polymerase II forms effective amphiphiles that intersperse the two segregated phases. Taken together with previous experiments, our data suggest that chromatin is organized in the following way heterochromatin segregates from euchromatin by phase separation, while transcription organizes euchromatin similar to an active microemulsion.
biorxiv cell-biology 100-200-users 2017Isolation of nucleic acids from low biomass samples detection and removal of sRNA contaminants, bioRxiv, 2017-12-15
ABSTRACTBackgroundSequencing-based analyses of low-biomass samples are known to be prone to misinterpretation due to the potential presence of contaminating molecules derived from laboratory reagents and environments. Due to its inherent instability, contamination with RNA is usually considered to be unlikely.ResultsHere we report the presence of small RNA (sRNA) contaminants in widely used microRNA extraction kits and means for their depletion. Sequencing of sRNAs extracted from human plasma samples was performed and significant levels of non-human (exogenous) sequences were detected. The source of the most abundant of these sequences could be traced to the microRNA extraction columns by qPCR-based analysis of laboratory reagents. The presence of artefactual sequences originating from the confirmed contaminants were furthermore replicated in a range of published datasets. To avoid artefacts in future experiments, several protocols for the removal of the contaminants were elaborated, minimal amounts of starting material for artefact-free analyses were defined, and the reduction of contaminant levels for identification of bona fide sequences using ‘ultraclean’ extraction kits was confirmed.ConclusionThis is the first report of the presence of RNA molecules as contaminants in laboratory reagents. The described protocols should be applied in the future to avoid confounding sRNA studies.
biorxiv molecular-biology 100-200-users 2017Estimating the functional dimensionality of neural representations, bioRxiv, 2017-12-14
AbstractRecent advances in multivariate fMRI analysis stress the importance of information inherent to voxel patterns. Key to interpreting these patterns is estimating the underlying dimensionality of neural representations. Dimensions may correspond to psychological dimensions, such as length and orientation, or involve other coding schemes. Unfortunately, the noise structure of fMRI data inflates dimensionality estimates and thus makes it difficult to assess the true underlying dimensionality of a pattern. To address this challenge, we developed a novel approach to identify brain regions that carry reliable task-modulated signal and to derive an estimate of the signal’s functional dimensionality. We combined singular value decomposition with cross-validation to find the best low-dimensional projection of a pattern of voxel-responses at a single-subject level. Goodness of the low-dimensional reconstruction is measured as Pearson correlation with a test set, which allows to test for significance of the low-dimensional reconstruction across participants. Using hierarchical Bayesian modeling, we derive the best estimate and associated uncertainty of underlying dimensionality across participants. We validated our method on simulated data of varying underlying dimensionality, showing that recovered dimensionalities match closely true dimensionalities. We then applied our method to three published fMRI data sets all involving processing of visual stimuli. The results highlight three possible applications of estimating the functional dimensionality of neural data. Firstly, it can aid evaluation of model-based analyses by revealing which areas express reliable, task-modulated signal that could be missed by specific models. Secondly, it can reveal functional differences across brain regions. Thirdly, knowing the functional dimensionality allows assessing task-related differences in the complexity of neural patterns.
biorxiv neuroscience 100-200-users 2017Genetic landscapes reveal how human genetic diversity aligns with geography, bioRxiv, 2017-12-14
Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate effective migration surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are rugged, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.
biorxiv evolutionary-biology 100-200-users 2017MAGpy a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs), bioRxiv, 2017-12-14
AbstractRecent advances in bioinformatics have enabled the rapid assembly of genomes from metagenomes (MAGs), and there is a need for reproducible pipelines that can annotate and characterise thousands of genomes simultaneously. Here we present MAGpy, a Snakemake pipeline that takes FASTA input and compares MAGs to several public databases, checks quality, assigns a taxonomy and draws a phylogenetic tree.
biorxiv bioinformatics 100-200-users 2017