Tuesday, September 10, 2013

Post-publication peer review and social shaming

Is the peer review system broken? Arguably. Peer review can be frustrating, both in what gets through and what doesn't. So recently there has been been a lot of talk about the virtues of post-publication peer review (e.g. links here, here, and here), where folks on the internet comment on scientific publications after they are public. One suggestion is that post-publication peer review might even one day replace the standard process.

Commenting on papers after they are published has to be a good idea in some form: more discussion of science and more opportunities to correct the record! But I want to argue against using post-publication peer review, at least in its current form, as the primary method of promoting stronger cultural norms for reliable research.

1. Pre-publication peer review is a filter, while post-publication peer review is an audit.  

With few exceptions, peer-review is applied to all submitted papers. And despite variations in the process from journal to journal, the overall approach is quite standard. This uniformity doesn't necessarily lead to perfect decisions, where all the good papers are accepted and bad papers are rejected. Nevertheless, peer review is a filter that is designed to be applied across the board so that the scientific record contains only those findings that "pass through" (consider the implicit filter metaphor!).

When I review a journal article I typically spend at least a couple of hours reading, thinking, and writing. These hours have to be "good hours" when I am concentrating carefully and don't have a lot of disruptions. I would usually rather be working on my own research or going for a walk in the woods. But for a paper to be published, a group of reviewers needs to commit those valuable hours. So I accept review requests out of a sense of obligation or responsibility, whether it's to the editor, the journal, or the field more generally. 

This same mechanism doesn't operate for post-publication review. There is no apparatus for soliciting commenters post-publication. So only a few articles, particularly those that receive lots of media coverage, will get the bulk of the thoughtful, influential commentary (see e.g. Andrew Gelman's post on this). The vast majority will go unblogged.

Post-publication peer review is thus less like a filter and more like an audit. It happens after the fact and only in select cases. Audits are also somewhat random in what attracts scrutiny and when. There is always something bad you can say about someone's tax return - and about their research practices.

2. Post-publication peer review works via a negative incentive: social shaming.

People are generally driven to write thoughtful critiques only when they think that something is really wrong with the research (see e.g. links hereherehere, and here). This means that nearly all post-publication peer review is negative.

The tone of the posts linked above is very professional, even if the overall message about the science is sometimes scathing. But one negative review typically spurs a host of snarky follow-ons on twitter, leaving a single research group or paper singled out for an error that may need to be corrected much more generally. Often critiques are merited. But they can leave the recipients of the critique feeling as though the entire world is ganging up against them.

For example, consider the situation surrounding a failure to replicate John Bargh's famous elderly priming study. Independent of what you think of the science, the discussion was heated. A sympathetic Chronicle piece used the phrase "scientific bullying" to describe the criticisms of Bargh, noting that this experience was the "nadir" of his career. That sounds right to me: I've only been involved in one, generally civil, public controversy (my paperreplymy reply backfinal answer) and I found that experience extremely stressful. Perceived social shaming or exclusion can be a very painful process. I'm sure reviewers don't intend their moderately-phrased statistical criticisms to result in this kind of feeling, but - thanks to the internet - they sometimes do.

3. Negative incentives don't raise compliance as much as positive cultural influences do.

Tax audits (which carry civil and  criminal penalties, rather than social ones) do increase compliance somewhat. But a review of the economics literature suggests that cultural factors - think US vs. Greece - matter more than the sheer expected risk due to audit enforcement (discussion here and here).* For example, US audit rates have fluctuated dramatically in the last fifty years, with only more limited effects on compliance (see e.g. this analysis).**

Similarly, if we want to create a scientific culture where people follow good research practices because of a sense of pride and responsibility - rather than trying to enforce norms through fear of humiliation - then increasing the post-publication audit rate is not the right way to get there. Instead we need to think about ways to change the values in our scientific culture. Rebecca Saxe and I made one pedagogical suggestion here, focused on teaching replication in the classroom.

Some auditing is necessary for both tax returns and science. The overall increase in post-publication discussion is a good thing, leading to new ideas and a stronger articulation and awareness of core standards for reliable research. So the answer isn't to stop writing post-pub commentaries. It's just to think of them as a complement to - rather than a replacement for - standard peer review.

Finally, we need to write positive post-publication reviews. We need to highlight good science and most especially strong methods (e.g. consider The Neurocomplimenter). The options can't be either breathless media hype that goes unquestioned or breathless media hype that is soberly shot down by responsible academic critics. We need to write careful reviews, questions, and syntheses for papers that should remain in the scientific literature. If we only write about bad papers, we don't do enough to promote changes in our scientific standards.

---
* It's very entertaining: The tone of this discussion is overall one of surprise that taxpayers' behavior doesn't hew to the rational norms implied by audit risk.
** Surprisingly, I couldn't find the exact graph I wanted: audit rates vs. estimated compliance rates. If anyone can find this, please let me know!

8 comments:

  1. You seem to be coming close to saying that, if you find problems with a paper that has received massive uncritical media coverage, you should not say anything in public about it, because it will upset the author.

    ReplyDelete
  2. Hi Deborah,

    Thanks for the comment. I enjoy reading post-publication comments by you, Dan Simons, and others, and I hope you continue to write them. My suggestion was not at all that you or others should refrain.

    Instead, I was trying to argue two things:

    1. Post-publication peer review doesn't have the right incentive structure to replace standard peer review (because most papers won't get comments, and audit pressure doesn't do a great job of enforcing compliance on its own), and

    2. Positive, synthetic comments *in addition to* critical comments are necessary in order to create culture changes.

    ReplyDelete
    Replies
    1. I totally agree. Open comments on a peer-reviewed paper are interesting & happen anyway, but are no surrogate for private pre-publication peer review.

      Delete
  3. Bloggers have every right to criticize published work, but I wish more of them would acknowledge the power they hold, and take more care by contacting authors for clarification or to ask for data, for example. There's more at stake than feelings. Careers can be damaged by negative blog posts, the inevitable "trial by Twitter", and the resulting permanent scar on the author's google search results. Let's not pretend that doesn't matter these days. Potential employers google applicants.

    Blogs started as a counter-culture movement, but now if they're not quite the new establishment yet they are at least heading that way. They have considerable reach and power. And we all know what comes with great power, or at least ought to.

    ReplyDelete
  4. I agree, Kyle.

    I have read the posts and comments on your and Dan Casasanto's work on Language Log. I really like reading discussions about findings like yours (the QWERTY effect), but I found the overall tone of this particular discussion dismissive and distasteful. I stand by what I wrote here: post-publication reviewers need to take care in their critiques - and small issues of tone in posts can make commenters even more extreme in their statements.

    One thing that I didn't mention in my previous discussion is that open data and materials play a strong role in facilitating the clarity of these discussions. Some authors are very responsive regarding publicizing materials, but nothing will ever be as easy as clicking over to the repository and seeing what the authors did directly. In your discussion with the Language Log folks, I thought several points would have been much facilitated by an open repository of the precise code used for the analyses.

    Mike

    ReplyDelete
  5. Mike, good post, I agree. The solution to the replicability "crisis" is not to shame people who publish results that aren't replicable. It could happen to the best of us, although we should all try to insure it doesn't, of course. If it's fraud that's one thing. But data can do it to you even if you're innocent of "QRPs" and/or intentional fraud.

    The solution isn't to stop replicating. But if the research community is going to get behind replication, it's going to have to be palatable to the research community. Perception does matter.

    ReplyDelete
  6. Hey Mike et al,

    My impression is that most post publication peer review doesn't focus on science that is simply bad or unreplicable. Rather, it focuses on science that is perceived to be oversold or overhyped. Replicability might not respond to social incentives, but overselling might (as it's a response to a social incentive in the first place).

    If you publish a paper in a field journal saying "I have provided modest evidence that pushing your tongue into your cheek promotes flippant behavior" then you are unlikely to suffer the trauma of a Dutch psychologist tearing into your paper like a pitbull. That's because you've only made a modest claim (by virtue of your language and the setting for publication), and if that modest claim turns out to be incorrect then it only requires modest correction.

    By contrast, if you use the same weak data to make a strong claim in a flagship journal (and claims made in flagship journals should be strong), then that claim will be widely dispersed and believed. If the evidential foundations of that claim turn out to be weak, a strong corrective is needed (like PPPR).

    I think we all know that there's a big incentive in science to make overly strong claims about data: It gets you a paper in Nature, a segment on NPR, a big grant, and an academic job. That damages the career prospects of those who are more careful or who aren't willing to overclaim. PPPR might provide some balance, which would be good all round.

    Ta,

    Hugh

    ps, just to be clear, I don't think your rule learning work falls into the over-selling camp. In fact, I thought that that back-and-forth was a really useful discussion.

    ReplyDelete
    Replies
    1. Hugh,

      I agree that some PPPR focuses on correcting over-hyped science and that this is a valuable function. Two points on this:

      1. Your description above is a vision of PPPR in which commentary plays a fundamentally different role than standard peer review - it is for discussing interpretation and implications of work after publication (e.g. like "news and views") rather than for enforcing methodological standards. I'm fine with that and think it makes more sense in a lot of ways, especially because - as you note - you mostly want to do this for high-profile papers that make strong claims. It's not necessary to PPPR reasonable papers in specialist journals.

      2. The corrective to an overhyped paper (much like a poor methodological decision) isn't snark. I think we have to be extra careful to be moderate in our PPPR responses because even a slightly-snarky blogpost can inspire much more informal vitriol that leads to the (mis-)perception of "bullying."

      So I'm definitely in favor of the kind of PPPR that you describe - I just think we need to be very aware of the rationale and consequences if we want to foster a professional role for blogging and tweeting in providing proper and timely evaluation of new science.

      best,

      Mike

      PS: I think the discussion I had with Endress isn't quite PPPR - that's more like traditional academic back-and-forth. But I'm glad you agree we didn't oversell. ;)

      Delete