(Part 1 of a series of two blogposts on this topic. The second part is here.)
Reproducibility is a major problem in psychology and elsewhere. Much of the published literature is not solid enough to build on: experiences from my class suggest that students can get interesting stuff to work about half the time, at best. The recent findings of the reproducibility project only add to this impression.* And awareness has been growing about all kinds of potential problems for reproducibility, including p-hacking, file-drawer effects, and deeper issues in the frequentist data analysis tools many of us were originally trained on. What we should do about this problem?
Many people advocate dramatic changes to our day-to-day scientific practices. While I believe deeply in some of these changes – open practices being one example – I also worry that some recommendations will hinder the process of normal science. I'm what you might call a "reproducibility moderate." A moderate acknowledges the problem, but believes that the solutions should not be too radical. Instead, solutions should be chosen to conserve the best parts of our current practice.
Here are my thoughts on three popular proposed solutions to the reproducibility crisis: preregistration, publication of null results, and Bayesian statistics. In each case, I believe these techniques should be part of our scientific arsenal – but adopting them wholesale would cause more problems than it would fix.
Pre-registration. Pre-registering a study is an important technique for removing analytic degrees of freedom. But it also ties the analysts's hands in ways that can be cumbersome and unnecessary early in a research program, where analytic freedom is critical for making sense of the data (the trick is just not to publish those exploratory analyses as though they are confirmatory). As I've argued, preregistration is a great tool to have in your arsenal for large-scale or one-off studies. In cases where subsequent replications are difficult or overly costly, prereg allows you to have confidence in your analyses. But in cases where you can run a sequence of studies that build on one another, each replicating the key finding and using the same analysis strategy, you don't need to pre-register because your previous work naturally constrains your analysis. So: rather than running more one-off studies but preregistering them, we should be doing more cumulative, sequential work where – for the most part – preregistration isn't needed.
Publication of null findings. File drawer biases – where negative results are not published and so effect sizes are inflated across a literature – are a real problem, especially in controversial areas. But the solution is not to publish everything, willy-nilly! Publishing a paper, even a short one or a preprint, is a lot of work. The time you spend writing up null results is time you are not doing new studies. What we need is thoughtful consideration of when it is ethical to suppress a result, and when there is a clear need to publish.
Bayesian statistics. Frequentist statistical methods have deep conceptual flaws and are broken in any number of ways. But they can still be a useful tool for quantifying our uncertainty about data, and a wholesale abandonment of them in favor of Bayesian stats (or even worse, nothing!) risks several negative consequences. First, having a uniform statistical analysis paradigm facilitates evaluation of results. You don't have to be an expert to understand someone's ANOVA analysis. But if everyone uses one-off graphical models (as great as they are), then there are many mistakes we will never catch due to the complexity of the models. Second, the tools for Bayesian data analysis are getting better quickly, but they are nowhere near as easy to use as the frequentist ones. To pick on one system, as an experienced modeler, I love working with Stan. But until it stops crashing my R session, I will not recommend it as a tool for first-year graduate stats. In the mean time, I favor the Cumming solution: A more gentle move towards confidence intervals, judicious use of effect size, and a decrease in reliance on inferences from individual instances of p < .05.
Sometimes it looks like we've polarized into two groups: replicators and everyone else. This is crazy! Who wants to spend an entire career replicating other people's work, or even your own? Instead, replication needs to be part of our scientific process more generally. It needs to be a first step, where we build on pre-existing work, and a last step, where we confirm our findings prior to publication. But the steps in the middle – where you do the real discovery – are important as well. If we focus only on those first and last steps and make our recommendations in light of them alone, we forget the basic practice of science.
* I'm one of many, many authors of that project, having helped to contribute four replication projects from my graduate class.