There's a tension in discussions of open science, one that is also mirrored in my own research. What I really care about are the big questions of cognitive science: what makes people smart? how does language emerge? how do children develop? But in practice I spend quite a bit of my time doing meta-research on reproducibility and replicability. I often hear critics of open science – focusing on replication, but also other practices – objecting that open science advocates are making science more boring and decreasing the focus on theoretical progress (e.g., Locke, Strobe & Strack). The thing is, I don't completely disagree. Open science is not inherently interesting.
Sometimes someone will tell me about a study and start the description by saying that it's pre-registered, with open materials and data. My initial response is "ho hum." I don't really care if a study is preregistered – unless I care about the study itself and suspect p-hacking. Then the only thing that can rescue the study is preregistration. Otherwise, I don't care about the study any more; I'm just frustrated by the wasted opportunity.
So here's the thing: Although being open can't make your study interesting, the failure to pursue open science practices can undermine the value of a study. This post is an attempt to justify this idea by giving an informal Bayesian analysis of what makes a study interesting and why transparency and openness is then the key to maximizing study value.
What makes a scientific study interesting?I take a fundamentally Bayesian approach to scientific knowledge. If you haven't encountered Bayesian philosophy of science, here's a nice introduction by Strevins; I find this framework nicely fits my intuitions about scientific reasoning. The core assumption is that knowledge in a particular domain can be represented by a probability distribution over theoretical hypotheses, given the available evidence.* This distribution can be decomposed into the product of 1) the prior probability of each hypothesis and 2) the likelihood of the hypothesis given the available evidence. New evidence changes this posterior distribution, and the amount of change is quantified by information gain. Thus, an "interesting" study is simply one that leads to high information gain.
Some good intuitions fall out of this decision. First, consider a study that decisively selects between two competing hypotheses that are equally likely based on prior literature; this study leads to high information gain and is clearly quite "theoretically interesting." Next, consider a study that provides strong support for a particular hypothesis, but the hypothesis is already deeply established in the literature; it's much less informative and hence much less interesting. Would you spend time conducting a large, high-powered test of Weber's law? Probably not – it would probably show the same regularity as the hundreds or thousands of studies before it. Finally, consider a study that collects a large amount of detailed data, but the design doesn't distinguish between hypotheses. Despite the amount of data, the theoretical progress is minimal and hence the study is not interesting.**
Under this definition, an interesting study can't just have the potential to compare between hypotheses, it must provide evidence that changes our beliefs about which one is more probable.*** Larger samples and more precise measurements typically result in greater amounts of evidence, and hence lead to more important ("more interesting") studies. In the special case where the literature is consistent with two distinct hypotheses, evidence can be quantified by the Bayes Factor. The bigger the Bayes Factor, the more evidence a study provides in favor of one hypothesis compared with the other, and the greater the information gain.
How does open science affect whether a study is interesting?Transparency and openness in science includes the sharing of code, data, and experimental materials as well as the sharing of protocols and analytic intentions (e.g., through preregistration). Under the model described above, none of these practices add to the informational value of a study. Having the raw data available or knowing that the inferential statistics are appropriate due to preregistration can't make a study better – the data are still the data, and the evidence is still the evidence.
If there is uncertainty about the correctness of a result, the informational value of the study is decreased. Consider a study that in principle decides between two hypotheses, but imagine the skeptical reader has no access to the data and harbors some belief that there has been a major analytic error. The reader can quantify her uncertainty about the evidential value of the study by assigning probabilities to the two outcomes: either the study is right, or else it's not. Integrating across these two outcomes, the value of the study is of course lower than if she knows the study has now error. Or similarly, imagine that the reader believes that another, different statistical test was equally appropriate but that the authors selected the one they report post hoc (leading to an inflation of their risk of a false positive).**** Again, uncertainty about the presence of p-hacking decreases the evidential value of the study, and the decrease is proportional to the strength of the belief.
Open science practices decrease the belief in p-hacking or error, and thus preserve the evidential value of the study. If the skeptical reader has the ability to repeat the data analysis ("computational reproducibility"), the possibility of error is decreased. If she has access to the preregistration, the possibility of p-hacking is similarly minimized. Both of these steps mean that the informational value of the study is maintained rather than decreased.
One corollary to this formulation is that replication can be a way to "rescue" particular interesting research designs. A finding can – by virtue of its design – have the potential to be theoretically important, but it may have limited evidential value, whether because of the small sample, imprecision of the measurements, or worries about error or p-hacking. In this case, a replication can substantially alter the theoretical landscape by adding evidence to the picture (this point is made by Klein et al. in their commentary on the ManyLabs studies). So then replication in general can be interesting or uninteresting – depending on the strength of the evidence for the original finding and its theoretical relevance. The most interesting replications will be those that target a finding with a design that allows for high information gain but for which the evidence is weak.
ConclusionsOpen science practices won't make your study interesting or important by themselves. The only way to have an interesting study is the traditional way: create a strong experimental design grounded in theory, and gather enough evidence to force scientists to update their beliefs. But what a shame if you have gone this route and then the value of your study is undermined! Transparency is the only way to ensure that readers assign the maximal possible evidential value to your your work.
* As a first approximation, let's be subjectively Bayesian, so that the distribution is in the heads of individual scientists and represents their beliefs. Of course, no scientist is perfect, but we're thinking about an idealized rational scientist who weighs the evidence and has reasonably fair subjective priors.
** Advocates for hypothesis-neutral data collection argue that later investigators can bring their own hypotheses to a dataset. In the framework I'm describing here, you could think about the dataset having some latent value that isn't realized until the investigator comes along and considers whether the data are consistent with their particular hypotheses. Big multivariate datasets can be very informative in this way, even if they are not collected with any particular analysis in mind. But investigators always have to be on their guard to ensure that particular analyses aren't undermined by the post-hoc nature of the investigation.
*** Even though evidence in this sense is computed after data collection, that doesn't rule out the prospective analysis of whether a study will be interesting. For example, you can compute the expected information gain using optimal experimental design. Here's a really nice recent preprint by Coenen et al. on this idea.
**** I know that this use of the p-hacking framework mixes my Bayesian apples in with some frequentist pairs. But you can just as easily do post-hoc overfitting of Bayesian models (see e.g., the datacolada post on this topic).
Post a Comment