A curated database reveals trends in single-cell transcriptomics, bioRxiv, 2019-08-21
The more than 500 single-cell transcriptomics studies that have been published to date constitute a valuable and vast resource for biological discovery. While various “atlas” projects have collated some of the associated datasets, most questions related to specific tissue types, species, or other attributes of studies require identifying papers through manual and challenging literature search. To facilitate discovery with published single-cell transcriptomics data, we have assembled a near exhaustive, manually curated database of single-cell transcriptomics studies with key information descriptions of the type of data and technologies used, along with descriptors of the biological systems studied. Additionally, the database contains summarized information about analysis in the papers, allowing for analysis of trends in the field. As an example, we show that the number of cell types identified in scRNA-seq studies is proportional to the number of cells analysed. The database is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpswww.nxn.sesingle-cell-studiesgui>www.nxn.sesingle-cell-studiesgui<jatsext-link>.
biorxiv genomics 200-500-users 2019Assessment of computational methods for the analysis of single-cell ATAC-seq data, bioRxiv, 2019-08-18
AbstractBackgroundRecent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans) lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (20-50% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.ResultsWe present a benchmarking framework that was applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were evaluated by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.ConclusionsThis reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC was the only method able to analyze a large dataset (> 80,000 cells).
biorxiv bioinformatics 200-500-users 2019Membrane-bounded nucleoid discovered in a cultivated bacterium of the candidate phylum ‘Atribacteria’, bioRxiv, 2019-08-08
AbstractA key feature that differentiates prokaryotes from eukaryotes is the absence of an intracellular membrane surrounding the chromosomal DNA. Here, we report isolation of an anaerobic bacterium that possesses an additional intracytoplasmic membrane surrounding a nucleoid, affiliates with the yet-to-be-cultivated ubiquitous phylum ‘Ca. Atribacteria’, and possesses unique genomic features likely associated with organization of complex cellular structure. Exploration of the uncharted microorganism overturned the prevailing dogma of prokaryotic cell structure.
biorxiv microbiology 200-500-users 2019Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit, bioRxiv, 2019-07-26
AbstractPresent workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes.
biorxiv bioinformatics 200-500-users 2019Why we publish where we do Faculty publishing values and their relationship to review, promotion and tenure expectations, bioRxiv, 2019-07-21
AbstractUsing an online survey of academics at 55 randomly selected institutions across the US and Canada, we explore priorities for publishing decisions and their perceived importance within review, promotion, and tenure (RPT). We find that respondents most value journal readership, while they believe their peers most value prestige and related metrics such as impact factor when submitting their work for publication. Respondents indicated that total number of publications, number of publications per year, and journal name recognition were the most valued factors in RPT. Older and tenured respondents (most likely to serve on RPT committees) were less likely to value journal prestige and metrics for publishing, while untenured respondents were more likely to value these factors. These results suggest disconnects between what academics value versus what they think their peers value, and between the importance of journal prestige and metrics for tenured versus untenured faculty in publishing and RPT perceptions.
biorxiv scientific-communication-and-education 200-500-users 2019