SparK A Publication-quality NGS Visualization Tool, bioRxiv, 2019-11-17

AbstractWhile there are sophisticated resources available for displaying NGS data, including the Integrative Genomics Viewer (IGV) and the UCSC genome browser, exporting regions and assembling figures for publication remains challenging. In particular, customizing track appearance and overlaying track replicates is a manual and time-consuming process. Here, we present SparK, a tool which auto-generates publication-ready, high-resolution, true vector graphic figures from any NGS-based tracks, including RNA-seq, ChIP-seq, and ATAC-seq. Novel functions of SparK include averaging of replicates, plotting standard deviation tracks, and highlighting significantly changed areas. SparK is written in Python 3, making it executable on any major OS platform. Using command line prompts to generate figures allows later changes to be made very easy. For instance, if the genomic region of the plot needs to be changed, or tracks need to be added or removed, the figure can easily be re-generated within seconds without the manual process of re-exporting and re-assembling everything. After plotting with SparK, changes to the output SVG vector graphic files are simple to make, including text, lines, and colors. SparK is publicly available on GitHub <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comharbourlabSparK>httpsgithub.comharbourlabSparK<jatsext-link>.

biorxiv bioinformatics 100-200-users 2019

Variability in the analysis of a single neuroimaging dataset by many teams, bioRxiv, 2019-11-16

SummaryData analysis workflows in many scientific domains have become increasingly complex and flexible. To assess the impact of this flexibility on functional magnetic resonance imaging (fMRI) results, the same dataset was independently analyzed by 70 teams, testing nine ex-ante hypotheses. The flexibility of analytic approaches is exemplified by the fact that no two teams chose identical workflows to analyze the data. This flexibility resulted in sizeable variation in hypothesis test results, even for teams whose statistical maps were highly correlated at intermediate stages of their analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Importantly, meta-analytic approaches that aggregated information across teams yielded significant consensus in activated regions across teams. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset. Our findings show that analytic flexibility can have substantial effects on scientific conclusions, and demonstrate factors related to variability in fMRI. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for multiple analyses of the same data. Potential approaches to mitigate issues related to analytical variability are discussed.

biorxiv neuroscience 500+-users 2019

Genomics of a complete butterfly continent, bioRxiv, 2019-11-05

Never before have we had the luxury of choosing a continent, picking a large phylogenetic group of animals, and obtaining genomic data for its every species. Here, we sequence all 845 species of butterflies recorded from North America north of Mexico. Our comprehensive approach reveals the pattern of diversification and adaptation occurring in this phylogenetic lineage as it has spread over the continent, which cannot be seen on a sample of selected species. We observe bursts of diversification that generated taxonomic ranks subfamily, tribe, subtribe, genus, and species. The older burst around 70 Mya resulted in the butterfly subfamilies, with the major evolutionary inventions being unique phenotypic traits shaped by high positive selection and gene duplications. The recent burst around 5 Mya is caused by explosive radiation in diverse butterfly groups associated with diversification in transcription and mRNA regulation, morphogenesis, and mate selection. Rapid radiation correlates with more frequent introgression of speciation-promoting and beneficial genes among radiating species. Radiation and extinction patterns over the last 100 million years suggest the following general model of animal evolution. A population spreads over the land, adapts to various conditions through mutations, and diversifies into several species. Occasional hybridization between these species results in accumulation of beneficial alleles in one, which eventually survives, while others become extinct. Not only butterflies, but also the hominids may have followed this path.

biorxiv genomics 500+-users 2019

Generalizing RNA velocity to transient cell states through dynamical modeling, bioRxiv, 2019-10-29

AbstractThe introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation. The originally proposed framework obtains velocities as the deviation of the observed ratio of spliced and unspliced mRNA from an inferred steady state. Errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. With scVelo (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsscvelo.org>httpsscvelo.org<jatsext-link>), we address these restrictions by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems comprising transient cell states, which are common in development and in response to perturbations. We infer gene-specific rates of transcription, splicing and degradation, and recover the latent time of the underlying cellular processes. This latent time represents the cell’s internal clock and is based only on its transcriptional dynamics. Moreover, scVelo allows us to identify regimes of regulatory changes such as stages of cell fate commitment and, therein, systematically detects putative driver genes. We demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis. We anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification.

biorxiv bioinformatics 200-500-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo