Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping, bioRxiv, 2017-02-21
ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.
biorxiv genomics 100-200-users 2017Niche construction in evolutionary theory the construction of an academic niche?, bioRxiv, 2017-02-20
AbstractIn recent years, fairly far-reaching claims have been repeatedly made about how niche construction, the modification by organisms of their environment, and that of other organisms, represents a vastly neglected phenomenon in ecological and evolutionary thought. The proponents of this view claim that the niche construction perspective greatly expands the scope of standard evolutionary theory and that niche construction deserves to be treated as a significant evolutionary process in its own right, almost at par with natural selection. Claims have also been advanced about how niche construction theory represents a substantial extension to, and re-orientation of, standard evolutionary theory, which is criticized as being narrowly gene-centric and ignoring the rich complexity and reciprocity of organism-environment interactions. We examine these claims in some detail and show that they do not stand up to scrutiny. We suggest that the manner in which niche construction theory is sought to be pushed in the literature is better viewed as an exercise in academic niche construction whereby, through incessant repetition of largely untenable claims, and the deployment of rhetorically appealing but logically dubious analogies, a receptive climate for a certain sub-discipline is sought to be manufactured within the scientific community. We see this as an unfortunate, but perhaps inevitable, nascent post-truth tendency within science.
biorxiv evolutionary-biology 100-200-users 2017mixOmics an R package for ‘omics feature selection and multiple data integration, bioRxiv, 2017-02-15
AbstractThe advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently.We introducemixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a system biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latestmixOmicsintegrative frameworks for the multivariate analyses of ‘omics data available from the package.
biorxiv bioinformatics 100-200-users 2017Quantitative analysis of population-scale family trees using millions of relatives, bioRxiv, 2017-02-08
AbstractFamily trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data from genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights to population genetics theories on the dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data.One Sentence SummaryUsing massive crowd-sourced genealogy data, we created a population-scale family tree resource for scientific studies.
biorxiv genomics 100-200-users 2017Developmental diversification of cortical inhibitory interneurons, bioRxiv, 2017-02-03
ABSTRACTDiverse subsets of cortical interneurons play a particularly important role in the stability of the neural circuits underlying cognitive and higher order brain functions, yet our understanding of how this diversity is generated is far from complete. We applied massively parallel single-cell RNA-seq to profile a developmental time course of interneuron development, measuring the transcriptomes of over 60,000 progenitors during their maturation in the ganglionic eminences and embryonic migration into the cortex. While diversity within mitotic progenitors is largely driven by cell cycle and differentiation state, we observed sparse eminence-specific transcription factor expression, which seeds the emergence of later cell diversity. Upon becoming postmitotic, cells from all eminences pass through one of three precursor states, one of which represents a cortical interneuron ground state. By integrating datasets across developmental timepoints, we identified transcriptomic heterogeneity in interneuron precursors representing the emergence of four cardinal classes (Pvalb, Sst, Id2 and Vip), which further separate into subtypes at different timepoints during development. Our analysis revealed that the ASD-associated transcription factor Mef2c discriminates early Pvalb-precursors in E13.5 cells, and removal of Mef2c confirms its essential role for Pvalb interneuron development. These findings shed new light on the molecular diversification of early inhibitory precursors, and suggest gene modules that may link developmental specification with the etiology of neuropsychiatric disorders.
biorxiv neuroscience 100-200-users 2017Scaling single cell transcriptomics through split pool barcoding, bioRxiv, 2017-02-03
Constructing an atlas of cell types in complex organisms will require a collective effort to characterize billions of individual cells. Single cell RNA sequencing (scRNA-seq) has emerged as the main tool for characterizing cellular diversity, but current methods use custom microfluidics or microwells to compartmentalize single cells, limiting scalability and widespread adoption. Here we present Split Pool Ligation-based Transcriptome sequencing (SPLiT-seq), a scRNA-seq method that labels the cellular origin of RNA through combinatorial indexing. SPLiT-seq is compatible with fixed cells, scales exponentially, uses only basic laboratory equipment, and costs one cent per cell. We used this approach to analyze 109,069 single cell transcriptomes from an entire postnatal day 5 mouse brain, providing the first global snapshot at this stage of development. We identified 13 main populations comprising different types of neurons, glia, immune cells, endothelia, as well as types in the blood-brain-barrier. Moreover, we resolve substructure within these clusters corresponding to cells at different stages of development. As sequencing capacity increases, SPLiT-seq will enable profiling of billions of cells in a single experiment.
biorxiv genomics 100-200-users 2017