(tl;dr: "Ugh, can't we just get along?!" OR "aspirational reform meet actual policy?" OR "whither metascience?")
This post started out as a thread about the tribes of methodological reform in psychology, all of whom I respect and admire. Then it got too long, so it became a blogpost.
As folks might know, I think methodological reform in psychology is critical (some of my views have been formed by my work with the ManyBabies consortium). For the last ~2 years, I've been watching two loose groups of methodological reformers get mad at each other. It has made me very sad to see these conflicts because I like all of the folks involved. I've actually felt like I've had to take a twitter holiday several times because I can't stand to see some of my favorite folks on the platform yelling at each other.
What do the centrists and the radicals think?
One thread that catalyzed my thinking about this discussion was the "far left" and "center left" comparison that Charlie Ebersole proposed. Following that thread, I'll call these groups the centrists and the radicals.
I'm definitely not the first to notice this, but it bears repeating: The gender imbalance between prominent "mainstream" open science folks and those critiquing it from the methodological "left" is striking and concerning. 1/3
— Charlie Ebersole (@CharlieEbersole) January 29, 2021
Centrist reforms are things like preregistration, transparency guidelines, and tweaks to hypothesis testing (e.g., p-value thresholds, equivalence testing, or Bayesian hypothesis testing). There's no consensus "platform" for reforms, but a recent review summarizes the state of things quite well. Just to be clear, a number of authors of this article are collaborators and friends, and I think it's on the whole a really good article.
10 years of replication and reform in psychology. What has been done and learned?
— Brian Nosek (@BrianNosek) February 9, 2021
Our latest paper prepared for the Annual Review summarizes the advances in conducting and understanding replication and the reform movement that has spawned around it.https://t.co/i5GQRPGzIa
1/ pic.twitter.com/yIYzUCaGE0
I want to highlight some non-mainstream work on reproducibility, open science, replication crisis, meta-science by women. Reading and drawing from a diverse set of authors and ideas will help push this stream of work forward and help make science more open and inclusive.
— Berna Devezer (@zerdeve) March 3, 2020
I'm a centrist and a radical
Here's the thing. These views are not inconsistent! It's just that the implicit contexts of application are different. Centrists are trying to make broad policy recommendations for funders/journals/training programs; radicals are thinking about ideal scientific structures. Both viewpoints resonate with my personal experience.
Preregistration and iterative statistical modeling go hand in hand. [THREAD]
— Michael C. Frank (@mcxfrank) November 19, 2018
I'll illustrate via a new preprint from my lab that I'm very excited about, "Polite speech emerges from competing social goals" (w/ @EricaYoon4, @mhtessler, and Noah Goodman): https://t.co/LvUf3Pecns /1
On the other hand, I also teach experimental methods to psychology graduate students. In my teaching I'm much more of a centrist. In this context, I see lots of "garden variety" psych research on the topics that students are interested in. Much of it is not easily amenable to computational theory. (Here's a sample of the perspective I've developed in that course).
From the radicals, there's lots of interest in computational theory building and some very nice guides/explainers (e.g. this one by Olivia Guest and Andrea Martin, EDIT: these authors are just trying to help people understand modeling and want to be clear that they feel there is a place for qualitative theory and don't subscribe to a "radical" position). The radical tradition is what I was trained in and what I do. I love this kind of work. But psych is a VERY big place (TM). It feels to me like hubris to say to a student who does educational mindsets work, or emotion regulation, or longitudinal development of racial identity – "don't even bother unless you have my kind of computational theory." Maybe that's not what they want as an outcome from their research, and maybe they are right and I am wrong!
(As an aside: models and data go hand and hand, and it's not actually that clear to me that moving to computational theory is right in areas where there are no precise empirical measurements to explain. In 2013 I taught a fun class trying to make models of social behavior with Jamil Zaki and Noah Goodman. We made lots of models but had no reliable quantitative measurements to use to fit the models. So we had some pretty great computational theory – in my humble opinion – but we were still nowhere.)
So based on these musings, in my experimental methods class, I make more minimal recommendations to the students. To evaluate the effect of an intervention, plan your sample size and preregister the statistical test. Don't p-hack. Go ahead and explore your data but don't pretend p-values from that exploration are a sound basis for strong conclusions. Try to make good plots of your raw data. Again, these sound pretty centrist, even though like I said, in my own lab I'm much more of a radical!
The methodological practices that I recommend in class don't necessarily result in a robust body of theory. But at the same time, I have a strong conviction that they are a first step towards keeping people from tricking themselves while they stare at noise. Random promotion of noise to signal is rampant in the literature - we see it all the time when we try to replicate findings in class that are clearly the basis of post-hoc selection of significant p-values. So simply blocking this kind of noise promotion is an important first step.
Contexts for everything
Danielle Navarro make the general case wonderfully in the piece I linked above: "advocating preregistration as a solution to p-hacking (or its Bayesian equivalent) is deeply misguided because we should never have been relying on these tools as a proxy for scientific inference." I basically agree with this point completely. For my own research.
But I'm also worried that applying this standard as a blanket policy intervention across all of psychology (plus the other behavioral sciences, to say nothing of the clinical sciences) would be a disaster for everyone involved. What would people do when they didn't have computational theory or adequate statistical models but got asked by funders and journals to provide such theory? My guess is that they'd make it up in a way that satisfied the policy hoop they'd been asked to jump through and then would continue p-hacking.
Here are a few ideas about consensus metascience directions for both groups. Centrists should consider how they want to tweak policies to encourage cumulative science in the form of quantitative theory. How could we study the effects of quantitative theory on the robustness of empirical findings? I've got one idea: seems like literatures that test quantitative theories presuppose precise and replicable measurements; this is a testable correlational claim at least. I've also wondered about encouraging dose-response designs as a potential intervention on the standard 2x2 design that gets (over-)used in much of the psychology literature.
No comments:
Post a Comment