Babies Learning Language: Methodological reforms, or, If we all want the same things, why can't we be friends?

(tl;dr: "Ugh, can't we just get along?!" OR "aspirational reform meet actual policy?" OR "whither metascience?")

This post started out as a thread about the tribes of methodological reform in psychology, all of whom I respect and admire. Then it got too long, so it became a blogpost.

As folks might know, I think methodological reform in psychology is critical (some of my views have been formed by my work with the ManyBabies consortium). For the last ~2 years, I've been watching two loose groups of methodological reformers get mad at each other. It has made me very sad to see these conflicts because I like all of the folks involved. I've actually felt like I've had to take a twitter holiday several times because I can't stand to see some of my favorite folks on the platform yelling at each other.

This post is my - perhaps misguided - attempt to express appreciation for everyone involved and try to spell out some common ground.

What do the centrists and the radicals think?

One thread that catalyzed my thinking about this discussion was the "far left" and "center left" comparison that Charlie Ebersole proposed. Following that thread, I'll call these groups the centrists and the radicals.

I'm definitely not the first to notice this, but it bears repeating: The gender imbalance between prominent "mainstream" open science folks and those critiquing it from the methodological "left" is striking and concerning. 1/3
— Charlie Ebersole (@CharlieEbersole) January 29, 2021

Centrist reforms are things like preregistration, transparency guidelines, and tweaks to hypothesis testing (e.g., p-value thresholds, equivalence testing, or Bayesian hypothesis testing). There's no consensus "platform" for reforms, but a recent review summarizes the state of things quite well. Just to be clear, a number of authors of this article are collaborators and friends, and I think it's on the whole a really good article.

10 years of replication and reform in psychology. What has been done and learned?

Our latest paper prepared for the Annual Review summarizes the advances in conducting and understanding replication and the reform movement that has spawned around it.https://t.co/i5GQRPGzIa

1/ pic.twitter.com/yIYzUCaGE0
— Brian Nosek (@BrianNosek) February 9, 2021

In contrast to the centrists, radicals start with the critical importance of theory building, often via computational models. On this view, no matter how well planned a test is, if it's not posed as part of a comparison of theories, you are playing 20 questions with nature (as Newell said), and you probably won't win. Here's a nice guide to some of the work in this tradition:

I want to highlight some non-mainstream work on reproducibility, open science, replication crisis, meta-science by women. Reading and drawing from a diverse set of authors and ideas will help push this stream of work forward and help make science more open and inclusive.
— Berna Devezer (@zerdeve) March 3, 2020

In this debate, the rubber really hits the road in the discussion around preregistration. Preregistration is a critical part of centrist reforms (e.g., through registered reports) but is "redundant at best" in much of the more radical views (e.g., this really nice post by Danielle Navarro).

I'm a centrist and a radical

Here's the thing. These views are not inconsistent! It's just that the implicit contexts of application are different. Centrists are trying to make broad policy recommendations for funders/journals/training programs; radicals are thinking about ideal scientific structures. Both viewpoints resonate with my personal experience.

In my lab, I try to do science that conforms to the radical vision of ideal scientific structures! In much of my work, we do the kind of computational theory building that lets us make quantitative predictions in advance and test them using precise measurements. This kind paradigm obviates simple NHST p-values, though sometimes we include them anyway because reviewers. We do typically preregister this work though, to keep from fooling ourselves about our predictions. Here's an example:

Preregistration and iterative statistical modeling go hand in hand. [THREAD]

I'll illustrate via a new preprint from my lab that I'm very excited about, "Polite speech emerges from competing social goals" (w/ @EricaYoon4, @mhtessler, and Noah Goodman): https://t.co/LvUf3Pecns /1
— Michael C. Frank (@mcxfrank) November 19, 2018

On the other hand, I also teach experimental methods to psychology graduate students. In my teaching I'm much more of a centrist. In this context, I see lots of "garden variety" psych research on the topics that students are interested in. Much of it is not easily amenable to computational theory. (Here's a sample of the perspective I've developed in that course).

From the radicals, there's lots of interest in computational theory building and some very nice guides/explainers (e.g. this one by Olivia Guest and Andrea Martin, EDIT: these authors are just trying to help people understand modeling and want to be clear that they feel there is a place for qualitative theory and don't subscribe to a "radical" position). The radical tradition is what I was trained in and what I do. I love this kind of work. But psych is a VERY big place (TM). It feels to me like hubris to say to a student who does educational mindsets work, or emotion regulation, or longitudinal development of racial identity – "don't even bother unless you have my kind of computational theory." Maybe that's not what they want as an outcome from their research, and maybe they are right and I am wrong!

(As an aside: models and data go hand and hand, and it's not actually that clear to me that moving to computational theory is right in areas where there are no precise empirical measurements to explain. In 2013 I taught a fun class trying to make models of social behavior with Jamil Zaki and Noah Goodman. We made lots of models but had no reliable quantitative measurements to use to fit the models. So we had some pretty great computational theory – in my humble opinion – but we were still nowhere.)

So based on these musings, in my experimental methods class, I make more minimal recommendations to the students. To evaluate the effect of an intervention, plan your sample size and preregister the statistical test. Don't p-hack. Go ahead and explore your data but don't pretend p-values from that exploration are a sound basis for strong conclusions. Try to make good plots of your raw data. Again, these sound pretty centrist, even though like I said, in my own lab I'm much more of a radical!

The methodological practices that I recommend in class don't necessarily result in a robust body of theory. But at the same time, I have a strong conviction that they are a first step towards keeping people from tricking themselves while they stare at noise. Random promotion of noise to signal is rampant in the literature - we see it all the time when we try to replicate findings in class that are clearly the basis of post-hoc selection of significant p-values. So simply blocking this kind of noise promotion is an important first step.

Contexts for everything

I'm arguing that one difference between centrists and radicals is what the context of the claim is. The centrist in me says: "it's really easy to tell NSF/NIH to add preregistration, sample size planning, and data sharing, to the merit review criteria (think clinicaltrials.gov)." In contrast, I don't think anyone would even know what you meant if you said: "all grants need to have sound computational theory."

Danielle Navarro make the general case wonderfully in the piece I linked above: "advocating preregistration as a solution to p-hacking (or its Bayesian equivalent) is deeply misguided because we should never have been relying on these tools as a proxy for scientific inference." I basically agree with this point completely. For my own research.

But I'm also worried that applying this standard as a blanket policy intervention across all of psychology (plus the other behavioral sciences, to say nothing of the clinical sciences) would be a disaster for everyone involved. What would people do when they didn't have computational theory or adequate statistical models but got asked by funders and journals to provide such theory? My guess is that they'd make it up in a way that satisfied the policy hoop they'd been asked to jump through and then would continue p-hacking.

Here are a few ideas about consensus metascience directions for both groups. Centrists should consider how they want to tweak policies to encourage cumulative science in the form of quantitative theory. How could we study the effects of quantitative theory on the robustness of empirical findings? I've got one idea: seems like literatures that test quantitative theories presuppose precise and replicable measurements; this is a testable correlational claim at least. I've also wondered about encouraging dose-response designs as a potential intervention on the standard 2x2 design that gets (over-)used in much of the psychology literature.

On the other side, though, methodological radicals should take a look at the metascience policy intervention literature - where something actually gets changed in an official policy and then you measure the outcome. Through my collaborations with Tom Hardwicke, I've become convinced that this kind of work can make us clearer about our desired endpoints as science policy-makers – what counts as success when we propose methodological reforms?

One final comment. Another dynamic in this whole conversation is the failure – perceived and actual – of some centrist voices to engage constructively with the more radical critiques. As has been pointed out several times (as in the Ebersole tweet above), this lack of engagement may have to do with the gender distribution - more male voices in the center, more women on the radical side. These dynamics aren't good and this behavior is not OK. Leaders in the centrist parts of the field need to address the more radical critiques, especially those that come from folks who are deeply knowledgeable about the philosophical and statistical issues. The radical critiques of preregistration sometimes may get mistakenly written off as being part of a different genre of knee-jerk response to methodological reforms from less thoughtful corners of the field. This is sloppy. The radical work needs to be cited and discussed – and if, as I've suggested here, there's a response to the critiques based on pragmatics and policy issues, then that response needs to be articulated.

Conclusions

OK, in sum: Maybe this is part of being an official old person (TM) but, why can't we all just get along? Let's have radical ambitions for the future while taking well-scoped, pragmatic policy positions in the short term.

Babies Learning Language

Sunday, February 21, 2021

Methodological reforms, or, If we all want the same things, why can't we be friends?

What do the centrists and the radicals think?

I'm a centrist and a radical

Contexts for everything

Conclusions

No comments:

Post a Comment