Monday, April 8, 2019

A (mostly) positive framing of open science reforms

I don't often get the chance to talk directly and openly to people who are skeptical of the methodological reforms that are being suggested in psychology. But recently I've been trying to persuade someone I really respect that these reforms are warranted. It's a challenge, but one of the things I've been trying to do is give a positive, personal framing to the issues. Here's a stab at that. 

My hope is that a new graduate student in the fields I work on – language learning, social development, psycholinguistics, cognitive science more broadly – can pick up a journal and choose a seemingly strong study, implement it in my lab, and move forward with it as the basis for a new study. But unfortunately my experience is that this has not been the case much of the time, even in cases where it should be. I would like to change that, starting with my own work.

Here's one example of this kind of failure: As a first-year assistant professor, a grad student and I tried to replicate one of my grad school advisors' well-known studies. We failed repeatedly – despite the fact that we ended up thinking the finding was real (eventually published as Lewis & Frank, 2016, JEP:G). The issue was likely that the original finding was an overestimate of the effect, because the original sample was very small. But converging on the truth was very difficult and required multiple iterations.

This kind of thing happens to me quite a lot. I run a class in which first year PhDs in my department try to replicate the published literature, often articles from Psych Science and other top journals. I've blogged about this course (e.g., here) and published on outcomes from it as well (Hawkins, Smith et al., 2018, AMPPS). More than half of the time, these replication studies fail, roughly consistent with estimates from larger meta-science projects like RPP and the more recent (and higher-quality) ManyLabs 2 and Social Science Replication projects.

The reasons for this failure are not always clear, and we don't always do the extensive followup work necessary to "debug" the experiment. But over time I have tried to identify a number of reasons for failures and use them as guides to the way I run my lab and provide methodological training for students. I also have advocated for journals and funders to adopt these reforms. Most are about transparency, and some are about good design practices. These reforms have been a win-win for my lab. They improve the clarity, impact, and validity of our work – mostly while speeding things up! Here they are.

Share code and data. Several studies, including ours (Hardwicke et al., 2018, Royal Soc Open Science) show that MOST published journal articles contain some statistical errors, ranging from the trivial to the extreme. In reproducing the analytic calculations from a number of prominent papers (which would only be possible through data sharing), we have found major errors requiring correction in quite a few. Creating clear sharing pipelines leads to cleaner, easier-to-check papers.

Use a reproducible workflow. Technical tools like git, RMarkdown, Jupyter, etc. facilitate students and other researchers reporting results whose provenance and relationship to the underlying data are known. These tools also speed up research dramatically, letting you share and reuse code effectively much more often and auto-generate tables, graphs, and other elements of reports. They also decrease copy/paste errors in reporting! And for me as a PI, I love being able to "audit" the work that folks in my lab do, and to quickly and easily pull in figures, data, or other excerpts from github when I need to add them to a talk.

Preregister. Everything in my lab is preregistered. All this means is that people in my lab need to write down what they are going to do (sample size, main analysis) before they do it. Here's a sample. If we have talked things through enough, writing the registration often takes 30 minutes; of course for more complex projects, more thought is needed (and it's a good thing to do that thinking ahead of time!). This process is not binding – we routinely violate our registration, and report our violation – and takes very little time. It just makes us transparently report what we knew before doing the study.  As an added bonus, if you care about p < .05 results (I mostly don't), these are really only valid in the case of a preregistered hypothesis. There's what I think is a pretty good explanation of this perspective in our transparency guide from last year (Klein et al., 2018, Collabra).

Follow best practices in experimental design. That means thinking about reliability and validity, and using a psychometric perspective (e.g., including sampling multiple experimental items). It also means planning a sample size that is sufficient to get precise enough measures to make quantitative predictions. There is a huge body of knowledge about how to do good experiments from Rosenthal and Rosnow onward – but often we rely on lab lore and implicit learning.

In sum, my worries about the literature have led me to a set of practices that – I think – have enhanced the research we do and made it more reproducible and replicable, while not slowing us down or making our workflow more onerous.


  1. Hi Michael, do you have a formal data management plan that guides the workflow of data release? We have been developing one.

    I have benefitted a lot from releasing data and code. In one case, I gave the data and code of an unpublished paper to someone who had a stake in the results, and he found a mistake in our code (a contrast coding reversal of sign) that we were able to fix before submitting the paper. This has happened during a review process too, when a reviewer looked at our code. In my estimation it's basically impossible to work on a big project without introducing errors, at least small ones, in one's code.

    It would be cool if one could convince one's lab members to act as adversarial code reviewers for our manuscripts; if one had an exchange of favors type of setup, there could be a lot of value added. In practice, nobody has the time to review code for someone else because no credit is given. It would be nice if some kind of credit system could be created.

    1. Hi Shravan, We don't have anything too formal, but we have gotten in the habit of doing a reproducibility check (someone else can build the paper from the repository) prior to submission for many papers. That's been really helpful. Formal code review would be a great next step! Please do share your plan if you get one!

    2. Do you release the entire source Rnw or Rmd file that is the paper publicly? We create a simpler vignette that allows a quick walk through the main code chunks. Also, we can *never* get the Rnw or Rmd file to compile correctly across collaborators' computers. Someone among the authors always has a compile failure. So we felt it was unlikely that an outsider can compile our source files.