Nanopore Long-Read RNAseq Reveals Widespread Transcriptional Variation Among the Surface Receptors of Individual B cells, bioRxiv, 2017-04-14
ABSTRACTUnderstanding gene regulation and function requires a genome-wide method capable of capturing both gene expression levels and isoform diversity at the single cell level. Short-read RNAseq, while the current standard for gene expression quantification, is limited in its ability to resolve complex isoforms because it fails to sequence full-length cDNA copies of RNA molecules. Here, we investigated whether RNAseq using the long-read single-molecule Oxford Nanopore MinION sequencing technology (ONT RNAseq) would be able to identify and quantify complex isoforms without sacrificing accurate gene expression quantification. After successfully benchmarking our experimental and computational approaches on a mixture of synthetic transcripts, we analyzed individual murine B1a cells using a new cellular indexing strategy. Using the Mandalorion analysis pipeline we developed, we identified thousands of unannotated transcription start and end sites, as well as hundreds of alternative splicing events in these B1a cells. We also identified hundreds of genes expressed across B1a cells that displayed multiple complex isoforms, including several B cell specific surface receptors and the antibody heavy chain (IGH) locus. Our results show that not only can we identify complex isoforms, but also quantify their expression, at the single cell level.
biorxiv genomics 100-200-users 2017Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing, bioRxiv, 2017-04-10
AbstractIllumina-based next generation sequencing (NGS) has accelerated biomedical discovery through its ability to generate thousands of gigabases of sequencing output per run at a fraction of the time and cost of conventional technologies. The process typically involves four basic steps library preparation, cluster generation, sequencing, and data analysis. In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines (HiSeq 30004000X Ten) called exclusion amplification (ExAmp), which was a fundamental shift from the earlier method of random cluster generation by bridge amplification on a non-patterned flow cell. The ExAmp chemistry, in conjunction with patterned flow cells containing nanowells at fixed locations, increases cluster density on the flow cell, thereby reducing the cost per run. It also increases sequence read quality, especially for longer read lengths (up to 150 base pairs). This advance has been widely adopted for genome sequencing because greater sequencing depth can be achieved for lower cost without compromising the quality of longer reads. We show that this promising chemistry is problematic, however, when multiplexing samples. We discovered that up to 5-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this “spreading-of-signals” arises from low levels of free index primers present in the pool. These index primers can prime pooled library fragments at random via complementary 3’ ends, and get extended by DNA polymerase, creating a new library molecule with a new index before binding to the patterned flow cell to generate a cluster for sequencing. This causes the resulting read from that cluster to be assigned to a different sample, causing the spread of signals within multiplexed samples. We show that low levels of free index primers persist after the most common library purification procedure recommended by Illumina, and that the amount of signal spreading among samples is proportional to the level of free index primer present in the library pool. This artifact causes homogenization and misclassification of cells in single cell RNA-seq experiments. Therefore, all data generated in this way must now be carefully re-examined to ensure that “spreading-of-signals” has not compromised data analysis and conclusions. Re-sequencing samples using an older technology that uses conventional bridge amplification for cluster generation, or improved library cleanup strategies to remove free index primers, can minimize or eliminate this signal spreading artifact.
biorxiv molecular-biology 500+-users 2017Looking into Pandora’s Box The Content of Sci-Hub and its Usage, bioRxiv, 2017-04-09
AbstractDespite the growth of Open Access, illegally circumventing paywalls to access scholarly publications is becoming a more mainstream phenomenon. The web service Sci-Hub is amongst the biggest facilitators of this, offering free access to around 62 million publications. So far it is not well studied how and why its users are accessing publications through Sci-Hub. By utilizing the recently released corpus of Sci-Hub and comparing it to the data of ˜28 million downloads done through the service, this study tries to address some of these questions. The comparative analysis shows that both the usage and complete corpus is largely made up of recently published articles, with users disproportionately favoring newer articles and 35% of downloaded articles being published after 2013. These results hint that embargo periods before publications become Open Access are frequently circumnavigated using Guerilla Open Access approaches like Sci-Hub. On a journal level, the downloads show a bias towards some scholarly disciplines, especially Chemistry, suggesting increased barriers to access for these. Comparing the use and corpus on a publisher level, it becomes clear that only 11% of publishers are highly requested in comparison to the baseline frequency, while 45% of all publishers are significantly less accessed than expected. Despite this, the oligopoly of publishers is even more remarkable on the level of content consumption, with 80% of all downloads being published through only 9 publishers. All of this suggests that Sci-Hub is used by different populations and for a number of different reasons, and that there is still a lack of access to the published scientific record. A further analysis of these openly available data resources will undoubtedly be valuable for the investigation of academic publishing.
biorxiv scientific-communication-and-education 200-500-users 2017Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale, bioRxiv, 2017-04-08
AbstractAlthough shotgun short-read sequencing has facilitated the study of strain-level architecture within complex microbial communities, existing metagenomic approaches often cannot capture structural differences between closely related co-occurring strains. Recent methods, which employ read cloud sequencing and specialized assembly techniques, provide significantly improved genome drafts and show potential to capture these strain-level differences. Here, we apply this read cloud metagenomic approach to longitudinal stool samples from a patient undergoing hematopoietic cell transplantation. The patient’s microbiome is profoundly disrupted and is eventually dominated by Bacteroides caccae. Comparative analysis of B. caccae genomes obtained using read cloud sequencing together with metagenomic RNA sequencing allows us to predict that particular mobile element integrations result in increased antibiotic resistance, which we further support using in vitro antibiotic susceptibility testing. Thus, we find read cloud sequencing to be useful in identifying strain-level differences that underlie differential fitness.
biorxiv bioinformatics 100-200-users 2017The dynamic upper limit of human lifespan, bioRxiv, 2017-04-06
AbstractWe respond to claims by Dong et al. that human lifespan is limited below 125 years. Using the log-linear increase in mortality rates with age to predict the upper limits of human survival we find, in contrast to Dong et al., that the limit to human lifespan is historically flexible and increasing. This discrepancy can be explained by Dong et al.’s use of data with variable sample sizes, age-biased rounding errors, and log(0) instead of log(1) values in linear regressions. Addressing these issues eliminates the proposed 125-year upper limit to human lifespan.
biorxiv physiology 100-200-users 2017Sex differences in the adult human brain Evidence from 5,216 UK Biobank participants, bioRxiv, 2017-04-05
AbstractSex differences in the human brain are of interest, for example because of sex differences in the observed prevalence of psychiatric disorders and in some psychological traits. We report the largest single-sample study of structural and functional sex differences in the human brain (2,750 female, 2,466 male participants; 44-77 years). Males had higher volumes, surface areas, and white matter fractional anisotropy; females had thicker cortices and higher white matter tract complexity. There was considerable distributional overlap between the sexes. Subregional differences were not fully attributable to differences in total volume or height. There was generally greater male variance across structural measures. Functional connectome organization showed stronger connectivity for males in unimodal sensorimotor cortices, and stronger connectivity for females in the default mode network. This large-scale study provides a foundation for attempts to understand the causes and consequences of sex differences in adult brain structure and function.
biorxiv neuroscience 500+-users 2017