The hidden elasticity of avian and mammalian genomes, bioRxiv, 2016-10-17
AbstractGenome size in mammals and birds shows remarkably little interspecific variation compared to other taxa. Yet, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been co-variation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size homeostasis. To test this model, we develop a computational pipeline to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 million years (My) in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extent across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified ‘accordion’ model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
biorxiv evolutionary-biology 0-100-users 2016Discovering event structure in continuous narrative perception and memory, bioRxiv, 2016-10-15
SummaryDuring realistic, continuous perception, humans automatically segment experiences into discrete events. Using a novel model of neural event dynamics, we investigate how cortical structures generate event representations during continuous narratives, and how these events are stored and retrieved from long-term memory. Our data-driven approach enables identification of event boundaries and event correspondences across datasets without human-generated stimulus annotations, and reveals that different regions segment narratives at different timescales. We also provide the first direct evidence that narrative event boundaries in high-order areas (overlapping the default mode network) trigger encoding processes in the hippocampus, and that this encoding activity predicts pattern reinstatement during recall. Finally, we demonstrate that these areas represent abstract, multimodal situation models, and show anticipatory event reinstatement as subjects listen to a familiar narrative. Our results provide strong evidence that brain activity is naturally structured into semantically meaningful events, which are stored in and retrieved from long-term memory.
biorxiv neuroscience 100-200-users 2016I Tried a Bunch of Things The Dangers of Unexpected Overfitting in Classification, bioRxiv, 2016-10-04
ABSTRACTMachine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.
biorxiv neuroscience 200-500-users 2016Nanopore DNA Sequencing and Genome Assembly on the International Space Station, bioRxiv, 2016-09-28
AbstractThe emergence of nanopore-based sequencers greatly expands the reach of sequencing into low-resource field environments, enabling in situ molecular analysis. In this work, we evaluated the performance of the MinION DNA sequencer (Oxford Nanopore Technologies) in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained mixtures of genomic DNA extracted from lambda bacteriophage, Escherichia coli (strain K12) and Mus musculus (BALBc). The in-flight sequencing experiments generated more than 80,000 total reads with mean 2D accuracies of 85 – 90%, mean 1D accuracies of 75 – 80%, and median read lengths of approximately 6,000 bases. We were able to construct directed assemblies of the ~4.7 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% pairwise identity, respectively, and de novo assemblies of the lambda and E. coli genomes generated solely from nanopore reads yielded 100% and 99.8% genome coverage, respectively, at 100% and 98.5% pairwise identity. Across all surveyed metrics (base quality, throughput, staysbase, skipsbase), no observable decrease in MinION performance was observed while sequencing DNA in space. Simulated runs of in-flight nanopore data using an automated bioinformatic pipeline and cloud or laptop based genomic assembly demonstrated the feasibility of real-time sequencing analysis and direct microbial identification in space. Applications of sequencing for space exploration include infectious disease diagnosis, environmental monitoring, evaluating biological responses to spaceflight, and even potentially the detection of extraterrestrial life on other planetary bodies.
biorxiv genomics 100-200-users 2016DNA Fountain enables a robust and efficient storage architecture, bioRxiv, 2016-09-10
AbstractDNA is an attractive medium to store digital information. Here, we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligos and perfectly retrieved the information from a sequencing coverage equivalent of a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecules and obtained a perfect retrieval from a density of 215Petabytegram of DNA, orders of magnitudes higher than previous techniques.
biorxiv synthetic-biology 100-200-users 2016Local genetic effects on gene expression across 44 human tissues, bioRxiv, 2016-09-10
AbstractExpression quantitative trait locus (eQTL) mapping provides a powerful means to identify functional variants influencing gene expression and disease pathogenesis. We report the identification of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated protein-coding genes, with one-third having multiple independent effects. We identify numerous tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in diverse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tissues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project provides an exceptional resource that has improved our understanding of gene regulation across tissues and the role of regulatory variation in human genetic diseases.
biorxiv genomics 0-100-users 2016