Reconstructing cell cycle and disease progression using deep learning, bioRxiv, 2016-10-19
AbstractWe show that deep convolutional neural networks combined with non-linear dimension reduction enable reconstructing biological processes based on raw image data. We demonstrate this by recon-structing the cell cycle of Jurkat cells and disease progression in diabetic retinopathy. In further analysis of Jurkat cells, we detect and separate a subpopulation of dead cells in an unsupervised manner and, in classifying discrete cell cycle stages, we reach a 6-fold reduction in error rate compared to a recent approach based on boosting on image features. In contrast to previous methods, deep learning based predictions are fast enough for on-the-fly analysis in an imaging flow cytometer.
biorxiv bioinformatics 100-200-users 2016The hidden elasticity of avian and mammalian genomes, bioRxiv, 2016-10-17
AbstractGenome size in mammals and birds shows remarkably little interspecific variation compared to other taxa. Yet, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been co-variation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size homeostasis. To test this model, we develop a computational pipeline to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 million years (My) in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extent across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified ‘accordion’ model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
biorxiv evolutionary-biology 0-100-users 2016Discovering event structure in continuous narrative perception and memory, bioRxiv, 2016-10-15
SummaryDuring realistic, continuous perception, humans automatically segment experiences into discrete events. Using a novel model of neural event dynamics, we investigate how cortical structures generate event representations during continuous narratives, and how these events are stored and retrieved from long-term memory. Our data-driven approach enables identification of event boundaries and event correspondences across datasets without human-generated stimulus annotations, and reveals that different regions segment narratives at different timescales. We also provide the first direct evidence that narrative event boundaries in high-order areas (overlapping the default mode network) trigger encoding processes in the hippocampus, and that this encoding activity predicts pattern reinstatement during recall. Finally, we demonstrate that these areas represent abstract, multimodal situation models, and show anticipatory event reinstatement as subjects listen to a familiar narrative. Our results provide strong evidence that brain activity is naturally structured into semantically meaningful events, which are stored in and retrieved from long-term memory.
biorxiv neuroscience 100-200-users 2016I Tried a Bunch of Things The Dangers of Unexpected Overfitting in Classification, bioRxiv, 2016-10-04
ABSTRACTMachine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.
biorxiv neuroscience 200-500-users 2016Nanopore DNA Sequencing and Genome Assembly on the International Space Station, bioRxiv, 2016-09-28
AbstractThe emergence of nanopore-based sequencers greatly expands the reach of sequencing into low-resource field environments, enabling in situ molecular analysis. In this work, we evaluated the performance of the MinION DNA sequencer (Oxford Nanopore Technologies) in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained mixtures of genomic DNA extracted from lambda bacteriophage, Escherichia coli (strain K12) and Mus musculus (BALBc). The in-flight sequencing experiments generated more than 80,000 total reads with mean 2D accuracies of 85 – 90%, mean 1D accuracies of 75 – 80%, and median read lengths of approximately 6,000 bases. We were able to construct directed assemblies of the ~4.7 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% pairwise identity, respectively, and de novo assemblies of the lambda and E. coli genomes generated solely from nanopore reads yielded 100% and 99.8% genome coverage, respectively, at 100% and 98.5% pairwise identity. Across all surveyed metrics (base quality, throughput, staysbase, skipsbase), no observable decrease in MinION performance was observed while sequencing DNA in space. Simulated runs of in-flight nanopore data using an automated bioinformatic pipeline and cloud or laptop based genomic assembly demonstrated the feasibility of real-time sequencing analysis and direct microbial identification in space. Applications of sequencing for space exploration include infectious disease diagnosis, environmental monitoring, evaluating biological responses to spaceflight, and even potentially the detection of extraterrestrial life on other planetary bodies.
biorxiv genomics 100-200-users 2016DNA Fountain enables a robust and efficient storage architecture, bioRxiv, 2016-09-10
AbstractDNA is an attractive medium to store digital information. Here, we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligos and perfectly retrieved the information from a sequencing coverage equivalent of a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecules and obtained a perfect retrieval from a density of 215Petabytegram of DNA, orders of magnitudes higher than previous techniques.
biorxiv synthetic-biology 100-200-users 2016