Inferring single-trial neural population dynamics using sequential auto-encoders, bioRxiv, 2017-06-21

Neuroscience is experiencing a data revolution in which simultaneous recording of many hundreds or thousands of neurons is revealing structure in population activity that is not apparent from single-neuron responses. This structure is typically extracted from trial-averaged data. Single-trial analyses are challenging due to incomplete sampling of the neural population, trial-to-trial variability, and fluctuations in action potential timing. Here we introduce Latent Factor Analysis via Dynamical Systems (LFADS), a deep learning method to infer latent dynamics from single-trial neural spiking data. LFADS uses a nonlinear dynamical system (a recurrent neural network) to infer the dynamics underlying observed population activity and to extract ‘de-noised’ single-trial firing rates from neural spiking data. We apply LFADS to a variety of monkey and human motor cortical datasets, demonstrating its ability to predict observed behavioral variables with unprecedented accuracy, extract precise estimates of neural dynamics on single trials, infer perturbations to those dynamics that correlate with behavioral choices, and combine data from non-overlapping recording sessions (spanning months) to improve inference of underlying dynamics. In summary, LFADS leverages all observations of a neural population’s activity to accurately model its dynamics on single trials, opening the door to a detailed understanding of the role of dynamics in performing computation and ultimately driving behavior.

biorxiv neuroscience 100-200-users 2017

Biological classification with RNA-Seq data Can alternative splicing enhance machine learning classifier?, bioRxiv, 2017-06-19

AbstractThe extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data.In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis.

biorxiv bioinformatics 0-100-users 2017

Environmental factors dominate over host genetics in shaping human gut microbiota composition, bioRxiv, 2017-06-17

AbstractHuman gut microbiome composition is shaped by multiple host intrinsic and extrinsic factors, but the relative contribution of host genetic compared to environmental factors remains elusive. Here, we genotyped a cohort of 696 healthy individuals from several distinct ancestral origins and a relatively common environment, and demonstrate that there is no statistically significant association between microbiome composition and ethnicity, single nucleotide polymorphisms (SNPs), or overall genetic similarity, and that only 5 of 211 (2.4%) previously reported microbiome-SNP associations replicate in our cohort. In contrast, we find similarities in the microbiome composition of genetically unrelated individuals who share a household. We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics. Consistent with our finding that microbiome and host genetics are largely independent, we find significant biome-explainability levels of 16-33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption. We further show that several human phenotypes can be predicted substantially more accurately when adding microbiome data to host genetics data, and that the contribution of both data sources to prediction accuracy is largely additive. Overall, our results suggest that human microbiome composition is dominated by environmental factors rather than by host genetics.

biorxiv genetics 200-500-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo