An exciting new manuscript by Tauber, Navarro, Perfors, and Steyvers makes a provocative claim: you can give up on the optimal foundations of Bayesian modeling and still make use of the framework as an explicit toolkit for describing cognition.** I really like this idea. For the last several years, I've been arguing for decoupling optimality from the Bayesian project. I even wrote a paper called "throwing out the Bayesian baby with the optimal bathwater" (which was about Bayesian models of baby data, clever right?).
In this post, I want to highlight two things about the TNPS paper, which I generally really liked and enjoyed reading. First, it contains an innovative fusion of Bayesian cognitive modeling and Bayesian data analysis. BDA has been a growing and largely independent strand of the literature; fusing BDA with cognitive models makes a lot of really rich new theoretical development possible. Second, it contains two direct replications that succeed spectacularly, and it does so without making any fuss whatsoever – this is, in my view, what observers of the "replication crisis" should be aspiring to.
1. Bayesian cognitive modeling meets Bayesian data analysis.
The meat of the TNPS paper revolves around three case studies in which they use the toolkit of Bayesian data analysis to fit cognitive models to rich experimental datasets. In each case they argue that taking an optimal perspective – in which the structure of the model is argued to be normative relative to some specified task – is overly restrictive. Instead, they specify a more flexible set of models with more parameters. Some settings of these parameters may be "suboptimal" for many tasks but have a better chance of fitting the human data. And the fitted parameters of these models then can reveal aspects of how human learners treat the data – for example, how heavily they weight new observations or what sampling assumptions they make.This fusion of Bayesian cognitive modeling and Bayesian data analysis is really exciting to me because it allows the underlying theory to be much more responsive to the data. I've been doing less cognitive modeling in recent years in part because my experience was that my models weren't as responsive as I liked to the data that I and others collected. I often came to a point where I would have to do something awful to my elegant and simple cognitive model in order to make it fit the human data.
One example of this awfulness comes from a paper I wrote on word segmentation. We found that an optimal model from the computational linguistics literature did a really good job fitting human data - if you assumed that it observed data equivalent to something between a tenth and a hundredth of the data the humans observed. I chalked this problem up to "memory limitations" but didn't have much more to say about it. In fact, nearly all my work on statistical learning has included some kind of memory limitation parameter, more or less – a knob that I'd twiddle to make the model look like the data.***
In their first case study, TNPS estimate the posterior distribution of this "data discounting" parameter as part of their descriptive Bayesian analysis. That may not seem like a big advance from the outside, but in fact it opens the door to putting into place much more psychologically-inspired memory models as part of the analytic framework. (Dan Yurovsky and I played with something a bit like this in a recent paper on cross-situational word learning – where we estimated a power-law memory decay on top of an ideal observer word learning model – but without the clear theoretical grounding that TNPS). I would love to see this kind of work really try to understand what this sort of data discounting means, and how it integrates with our broader understanding of memory.
2. The role of replication.
Something that flies completely under the radar in this paper is how closely TNPS replicate the previous empirical findings reported. Their Figure 1 tells a great story:Panel (a) shows the original data and model fits from Griffiths & Tenenbaum (2007), and panel (b) shows their own data and replicated fits. This is awesome. Sure, the model doesn't perfectly fit the data - and that's TNPS's eventual point (along with a related point about individual variation). But clearly GT measured a true effect, and they measured it with high precision.
The same thing was true of Griffiths & Tenenbaum (2006) – the second case study in TNPS. GT2006 was a study about estimating conditional distributions for different processes, e.g. given that you've lived X years, how likely is it that you live Y. At the risk of belaboring the point, I'll show you three datasets on this question. First from GT2006, second from TNPS, and third a new, unreported dataset from my replication class a couple of years ago.**** The conditions (panels) are plotted in different orders in each plot, but if you take the time to trace one, say lifespans or poems, you will see just how closely these three datasets replicate one another. Not just the shape of the curve but also the precise numerical values:
This result is the ideal outcome to strive for in our responses to the reproducibility crisis. Quantitative theory requires precise measurement - you just can't get anywhere fitting a model to a small number of noisily estimated conditions. So you have to strive to get precise measures – and this leads to a virtuous cycle. Your critics can disagree with your model precisely because they have a wealth of data to fit their more complex models to (that's exactly TNPS's move here).
I think it's no coincidence that quite a few of the first big data, mechanical turk studies I saw were done by computational cognitive scientists. Not only were they technically oriented and happy to port their experiments to the web, they also were motivated by a severe need for more measurement precision. And that kind of precision leads to exactly the kind of reproducibility we're all striving for.
---
* Think Tversky & Kahneman, but there are many many issues with this argument...
** Many thanks to Josh Tenenbaum for telling me about the paper; thanks also to the authors for posting the manuscript.
*** I'm not saying the models were in general overfit to the data – just that they needed some parameter that wasn't directly derived from the optimal task analysis.
**** Replication conducted by Calvin Wang.