Audiences

Overview

This web portal presents detailed, interactive analyses to accompany our recent paper:

Carlson J, Harris K. Quantifying and contextualizing the impact of bioRxiv preprints through automated social media audience segmentation. bioRxiv. 2020. doi:10.1101/2020.03.06.981589

In short, we selected 1800 bioRxiv preprints that received large amounts of attention on Twitter and collected an extensive catalog of “social media citations”—instances in which these preprints were mentioned in tweets or retweets (totaling 331,696 tweets and retweets). For each preprint, we inferred the underlying audience sectors by collecting data about the followers of each user who cited the article (totaling 476,813,827 data points), then applied a probabilistic topic modeling approach to characterize these latent properties of the Twitter audience.

Click on the tabs above to view various summaries of the preprints analyzed.

Click a journal or category to view a catalog of individual reports for the top articles.

				Average fraction of users with > h% white nationalist follower homophily
Journal	Category	Preprints Analyzed	Mean fraction of audience estimated to be academics	h=2%	h=5%	h=10%	h=20%
biorxiv	animal-behavior-and-cognition	24	0.77	0.07	0.05	0.02	0.01
biorxiv	biochemistry	17	0.89	0	0	0	0
biorxiv	bioengineering	21	0.88	0	0	0	0
biorxiv	bioinformatics	268	0.9	0	0	0	0
biorxiv	biophysics	45	0.92	0.01	0	0	0
biorxiv	cancer-biology	35	0.86	0.01	0	0	0
biorxiv	cell-biology	59	0.89	0.01	0	0	0
biorxiv	clinical-trials	4	0.52	0.03	0.01	0	0
biorxiv	developmental-biology	32	0.92	0.01	0	0	0
biorxiv	ecology	24	0.84	0.01	0	0	0
biorxiv	epidemiology	9	0.71	0.04	0.02	0.01	0
biorxiv	evolutionary-biology	112	0.89	0.03	0.02	0.01	0
biorxiv	genetics	157	0.84	0.09	0.06	0.03	0.01
biorxiv	genomics	364	0.91	0.02	0.01	0.01	0
biorxiv	immunology	31	0.88	0	0	0	0
biorxiv	microbiology	93	0.91	0.01	0	0	0
biorxiv	molecular-biology	47	0.9	0	0	0	0
biorxiv	neuroscience	298	0.87	0.02	0.01	0	0
biorxiv	paleontology	2	0.46	0.03	0.01	0.01	0
biorxiv	pathology	6	0.44	0.07	0.03	0.01	0
biorxiv	pharmacology-and-toxicology	2	0.46	0.01	0	0	0
biorxiv	physiology	9	0.55	0.06	0.02	0.01	0
biorxiv	plant-biology	43	0.91	0	0	0	0
biorxiv	scientific-communication-and-education	48	0.91	0.01	0	0	0
biorxiv	synthetic-biology	22	0.85	0.01	0	0	0
biorxiv	systems-biology	27	0.87	0.01	0	0	0
biorxiv	zoology	1	0.76	0.02	0	0	0

Academic demographics

For each preprint, we categorized the inferred audience sectors as either “academic” or “non-academic” according to the presence of keywords in each audience sector that indicate an association with academic communities (e.g., “phd”, “professor”, “university”). We then quantified the total proportion of the audience corresponding to academic audience sectors and compared these estimates to the estimates generated by Altmetric, the leading source of altmetric information for scholarly research articles.

Of the 1800 preprints analyzed, our method estimates a higher fraction of the audiences are academics/scientists than the Altmetric demographics for 1797 (100%) of these.

These audience demographic comparisons are summarized in the plot to the right. Points are colored according to their bioRxiv category, and the size is relative to the number of tweets/retweets referencing the paper. Click on a point to open the individual report.

Lay audience network homophily

Many preprints were found to have audience sectors that were primarily aligned with political affiliations. In some cases, these politically-aligned sectors included keywords indicating extreme far-right ideologies, including white nationalism. To systematically quantify this trend, for each preprint, we calculated the degree of network homophily (i.e., % overlap in followers) between each user and a curated set of prominent white nationalist accounts on Twitter (including, among others, former KKK leader David Duke and podcaster/former Youtuber, Stefan Molyneux). These plots show the distribution of white nationalist network homophily fraction (\(h\)) for the analyzed preprints at four different thresholds (\(h=2\%\), \(h=5\%\), \(h=10\%\), and \(h=20\%\)).

h=2%

h=5%

h=10%

h=20%

Political polarization

About

Background

Audiences is a framework for exploring the various audiences that are engaging with academic publications on Twitter.

Paper metadata and associated Twitter data was collected using APIs from Crossref, Altmetric, Rxivist, and Twitter.

The code for Audiences is written in R, and this site was generated with Hugo, with a modified version of the Mondrian template.

All code used in these analyses is available on Github.

Setup

Prerequisites and dependencies

You will need a recent version of RStudio if you wish to use the interactive notebook capabilities.

Audiences requires the following R packages to run:

library(RCurl)
library(rtweet)
library(tweetscores)
library(knitr)
library(markdown)
library(rmarkdown)
library(rAltmetric)
library(rvest)
library(rcrossref)
library(tidyverse)
library(yaml)
library(anytime)
library(here)
library(jsonlite)

Twitter API access

Once you have a developer account set up, copy and paste the API keys into config.yaml

Running Audiences

render_reports.R is a wrapper script to generate the reports for a list of papers. report_template.rmd is an R Markdown-formatted template.

Serving as a webpage

The reports are formatted as interactive HTML documents, making them ideal to share with others on a website. Each report is a self-contained .html file, so you can simply to your own personal website. (e.g., if you have a list of your lab’s papers on your website, you can generate a report for each and add a link to the corresponding .html)

Alternatively, if you have forked the Audiences Github repository, you can use Github pages to host the reports.

I am using Hugo with the hugrid template to create a simple static landing page with tiles that link to static/reports/report.html. The website files are hosted in docs, and the Github project page is set up to point to this directory.

Setup Twitter API

To reproduce these analyses or run Audiences on your own paper(s), you will first need to set up a Twitter developer account for access to the Twitter API. Documentation for setting up a Twitter dev account is available here. Once completed, copy and paste the app name, consumer keys, and access keys into the appropriate fields of config.yaml.

Generate reports

Running render_reports.R will generate a separate report for each of the papers listed in papers.txt by their Altmetric URLs (one per line). Reports are based on the report_template.rmd RMarkdown template.

Output

As each report runs, data scraped from the Twitter API will be cached to article_data/ to . Subsequent runs will look for the appropriate .rds files in this directory

Reports will be written to output/reports/ and thumbnail images for each report to output/figures/.

Build site

# generate data/items.toml, containing the links to reports to include in landing page
python generate_links.py

# build the landing page into _docs/static/ based on the hugrid Hugo template in themes/hugrid/
# - requires config.toml and data/items.toml
hugo

# build with mkdocs into docs/
# - requires mkdocs.yml and contents of _docs/
mkdocs build -d docs

# copy the reports & thumbnails into docs/static/
rsync -r output/ docs/static

# push changes to github and the documentation will be available at https://carjed.github.io/audiences/