Skip to main content
SearchLoginLogin or Signup

Reviews of "Preprinting a pandemic: the role of preprints in the COVID-19 pandemic"

Reviewers: Noah Haber (Stanford) • Emily Smith, Siran He (George Washington University)

Published onAug 11, 2020
Reviews of "Preprinting a pandemic: the role of preprints in the COVID-19 pandemic"
key-enterThis Pub is a Review of
Preprinting a pandemic: the role of preprints in the COVID-19 pandemic
Description

AbstractThe world continues to face an ongoing viral pandemic that presents a serious threat to human health. The virus underlying the COVID-19 disease, SARS-CoV-2, has caused over 3.2 million confirmed cases and 220,000 deaths between January and April 2020. Although the last pandemic of respiratory disease of viral origin swept the globe only a decade ago, the way science operates and responds to current events has experienced a paradigm shift in the interim. The scientific community has responded rapidly to the COVID-19 pandemic, releasing over 16,000 COVID-19 related scientific articles within 4 months of the first confirmed case, of which at least 6,000 were hosted by preprint servers. We focused our analysis on bioRxiv and medRxiv, two growing preprint servers for biomedical research, investigating the attributes of COVID-19 preprints, their access and usage rates, characteristics of their sharing on online platforms, and the relationship between preprints and their published articles. Our data provides evidence for increased scientific and public engagement (COVID-19 preprints are accessed and distributed at least 15 times more than non-COVID-19 preprints) and changes in journalistic practice with reference to preprints. We also find evidence for changes in preprinting and publishing behaviour: COVID-19 preprints are shorter, with fewer panels and tables, and reviewed faster. Our results highlight the unprecedented role of preprints and preprint servers in the dissemination of COVID-19 science, and the likely long-term impact of the pandemic on the scientific publishing landscape.

To read the original manuscript, click the link above.

Summary of Reviews: This study proves that COVID-19 has led to the unprecedented role of preprints and preprint servers in the dissemination of COVID-19 science. Findings are robust and informative, though there are some errors and misinterpretations.

Reviewer 1 (Noah Haber)

Reviewer 2 (Emily Smith, Siran He) | 📒📒📒 ◻️◻️

RR:C19 Strength of Evidence Scale Key

📕 ◻️◻️◻️◻️ = Misleading

📙📙 ◻️◻️◻️ = Not Informative

📒📒📒 ◻️◻️ = Potentially Informative

📗📗📗📗◻️ = Reliable

📘📘📘📘📘 = Strong

To read the reviews, click the links below.

Connections
1 of 2
A Supplement to this Pub
Comments
1
?
Jonny Coates:

Summary of author responses to the reviewers.

The authors thank the reviewers for their time and thoughtful reviews.

Detailed responses to individual reviewer points can be found below, with the author responses in italics. In response to the reviewer comments, we have removed our analysis of preprint-paper changes (figure 5) so that we can better focus our key messages. We have also reduced the word count and expanded our discussion.

An updated manuscript can be found on bioRxiv: https://www.biorxiv.org/content/10.1101/2020.05.22.111294v1

Reviewer 1: Noah Haber

Broadly speaking, this is a fascinating article with a large number of useful, interesting, and surprising results and conclusions. The effort required to have produced this article, pulling from such a large number of sources so quickly, is impressive. I was surprised by several of the findings, even being a person who follows the meta literature quite closely. I am further impressed with the detailed and transparent reporting of the methods section, the descriptions of why decisions were made, and the shear scope of sources and tools used.

I have found the major claims in this article to be relatively well-founded and justified by the methods and the data. Specifically, the I generally find that the claims about the properties of the COVID articles (more engagement, length, sheer volume of articles, etc) are relatively robust findings, and are applicable in the scientific meta practice. As a descriptive paper of the pre-print landscape of COVID-19, this is an excellent source of information.

We thank the reviewer for their kind comments

However, the study also contains a number of errors and misinterpretations in its current form, particularly with regard to interpretation of statistical inference. One such error is in the abstract and conclusions, and repeated throughout the manuscript: the odds ratios are nearly universally interpreted as rate ratios. For example, the abstract incorrectly states that “COVID-36 19 preprints are accessed and distributed at least 15 times more than non-COVID-19 preprints,” which is misinterpreted from an odds ratio. The two are strongly different measures and do not approximate one another in this context. I strongly recommend changing all odds ratio calculations to rate ratio calculations, since RRs are much more generally interpretable (and clearly what the authors prefer). If not, the authors should explicitly state that these are ratios of odds, not probabilities.

This was an error in writing, rather than the underlying statistics (as can be seen in our github repository). We have now corrected all instances of this throughout the manuscript to correctly refer to the statistics used.

A second major statistical issue regards the sample size attribution of different literatures. While the properties of individual papers that are COVID-19 vs. other papers are individual units, the literatures as a whole should not be. For example, the paper compares the relative proportion of the literature that was COVID-19-related with the relative proportion of the literature that was Zika-related, and concludes that they are different with p<0.001. However, because the comparison at the level of interest is a comparison of two binomial proportions, the effective sample size is 2, not the number of studies as claimed. This issue is repeated in a number of areas throughout the analysis. It is not an existential threat to the main conclusions, but it is misleading to claim that level of precision. Further, I am not sure it is meaningful to make the Zika comparison at all. As platforms get larger, the also experience more rapid proportionate growth in emerging topics in general. Had the Zika outbreak happened in 2020, I imagine we would have seen larger proportionate (not just count of papers) growth in the number of papers (albeit still almost certainly less relative to COVID).

The test we referred to was a test of association between the particular epidemic and preprint topics (either epidemic-related or unrelated); we find that these are non-independent, implying that preprints deposited during different epidemics had differing probabilities of being epidemic-related (whether driven by preprint platform changes, different interest in the respective viruses, or other factors, which we could not directly test here). We accept that this is not a direct test of binomial proportions and have clarified the text regarding the hypothesis tested. All other tests of association comparing proportions have been re-framed in the same manner.

Broadly, this document may be suffering from doing too much, which leaves too little room to discuss the limitations of the measures or to pinpoint or discuss areas of improvement. I would strongly suggest paring down, and potentially moving some sections of the paper into appendices or a separate publication. A few areas I find that are weaker arguments are the semantic analysis, and the documenting changes between preprint and publication, and the review/transparency sections. These tests are too limited to be used conclusively, and there is little room for discussion of the weaknesses and limitations of these tests. These are potentially useful for a separate publication.

We agree that the volume of work presented obscures the key messages. As such, we have removed figure 5 (comparison of preprints and their published articles) which will form the basis of a separate publication. We have also removed all semantic analysis from the manuscript. Moreover, we have reduced word counts and improved our discussion.

Overall, I found this pre-print to be useful, and would recommend it be edited and proceed to the full publication stage for further review and critique.

Reviewer 2: Siran He and Emily Smith

RR:C19 Evidence Scale rating by reviewer:

  • Potentially informative: The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

***************************************

Review:

This paper compares COVID-related and non-COVID-related preprints published on bioRxiv and medRxiv. Specifically, they calculate how often preprints are accessed, how the articles are shared on online platforms, and how preprints compare to their published articles. The data clearly demonstrates COVID-19 preprints are accessed, shared on social media, and referenced by journalists more often than non-COVID preprints. The study data, methods, and analysis do support the claim that: “Our results highlight the unprecedented role of preprints and preprint servers in the dissemination of COVID-19 science”. The authors also conclude that there is a “likely long-term impact of the pandemic on the scientific publishing landscape”, although there is not any specific data to support the second claim. The study is potentially informative.

Detailed Review

Introduction. The authors made a compelling case highlighting the gap between traditional publication processes and the urgent needs in disseminating scientific information regarding COVID. Current literature and context is well-described. The objective of this study is clearly stated.

Revision suggestion: Suggest removing the results and conclusion from the introduction(i.e., “We found that preprint servers... in this endeavour.”)

 

Results. Interesting and extensive analysis of the preprint repositories. It was particularly relevant to consider the issue of social media and misinformation as part of the analysis. The impact of the study could be enhanced by specifically investigating whether COVID-19 preprints address issues of health disparities or diversity, equity, and inclusion regarding study authors, study participants, or media headlines.

Revision suggestion: use sub-headings to organize content and help the readers navigate the many analyses.

We believe an investigation into COVID-19 preprints and issues of health disparities or diversity is beyond the scope of the present study. However, we agree that this represents an interesting future avenue.

Discussion. Thorough discussion. The main study finding is that preprints are playing a newly important role in science communication during the COVID-19 pandemic. This is well-supported by the research and clearly articulated in the discussion. However, the authors should specifically mention: what are the implications or recommendations to improve science communication that flow from this study finding? The paragraph about misinformation on twitter is interesting, but the authors could enhance the utility of the study by linking these results to the discussion about the speed at which new findings are shared, whether they are published, and how the preprint papers change by the time of publication.

Revision Suggestion: Two other study claims in the discussion are not well-supported by the data and methods. First, the discussion says “the pandemic has left what is likely to be a lasting imprint on the preprint and science publishing landscape”. None of the data speaks to changes on the publishing impact outside of the pandemic. They also conclude that rapid dissemination through preprints has not affected “quality of preprints that are subsequently published.” However, this study does not have data that speaks directly to the quality of the evidence.

We have removed the statement “the pandemic has left what is likely to be a lasting imprint on the preprint and science publishing landscape”. We have also added discussion to address the recommendations arising from this study;

“the problem of poor-quality science is not unique to preprints and ultimately, a multi-pronged approach is required to solve some of these issues. For example, scientists must engage more responsibly with journalists and the public, in addition to upholding high standards when sharing research. More significant consequences for academic misconduct and the swift removal of problematic articles will be essential in aiding this. Moreover, the politicisation of science has become a polarising issue and must be prevented at all costs. Thirdly, transparency within the scientific process is essential in improving the understanding of its internal dynamics and providing accountability.”

First, the discussion says “the pandemic has left what is likely to be a lasting imprint on the preprint and science publishing landscape”. None of the data speaks to changes on the publishing impact outside of the pandemic

We have removed this statement from the manuscript

They also conclude that rapid dissemination through preprints has not affected “quality of preprints that are subsequently published.” However, this study does not have data that speaks directly to the quality of the evidence.

We have clarified this statement with additional references and have linked our data on % preprints published with acceptance rates provided by publishers.

“We found comparative levels of preprints had been published within our short timeframe (Fig. 2) and that acceptance rates at several journals was only slightly reduced for COVID-19 research compared to non-COVID-19 articles (Supplemental Fig. 2) suggesting that, generally, preprints were relatively of good quality. Furthermore, recent studies have suggested that the quality of reporting in preprints differs little from their later peer-reviewed articles [44] and we ourselves are currently undertaking a more detailed analysis (see version 1 of our preprint for an initial analysis of published COVID preprints [45]).” 

Methods. The authors should be applauded for the major effort to build the study dataset from multiple information sources in order to comment on many aspects of the preprint lifecycle including eventual publication and mentions in social and traditional media. However, odds ratios are not the appropriate effect estimate to use. The risk / relative risk can be calculated instead and is preferred.

This was an error in writing, rather than the underlying statistics (as can be seen in our github repository). We have now corrected all instances of this throughout the manuscript to correctly refer to the statistics used.

Figures & Tables.

Figure 1D. Would it be possible to align the timeline of panel D with panels A - C? Or enlarge the grey knots for dates.

We have chosen to remove panel D from figure 1 and have discussed this in-text instead.

Figure 2C & 2D. Spell out country name abbreviations in footnote; Or, if the goal is to show clustering by continent (2D), remove the country IDs and enlarge the knots.

We have added the country name abbreviations to the figure legend to improve clarity in figure 2.

 

With many thanks, the authors of “Preprinting a pandemic; the role of preprints in the 2020 COVID-19 pandemic”.