Comments on Babies Learning Language: It's the random effects, stupid!

Hi Mike, in case you haven't seen this, here&#...

2021-02-20T14:26:53.497-08:00

Hi Mike, in case you haven't seen this, here's a paper and a poster making a complementary point to yours (from 2017, but I only discovered them now!)
https://arxiv.org/ftp/arxiv/papers/1701/1701.04858.pdf
http://publish.illinois.edu/quantitativelinguistics/files/2017/01/LSA2017.Mixed-Models-are-Sometimes-Terrible-final.pdf

One point I find compelling from personal experience is that more data isn't necessarily going to solve convergence and model selection problems in lme4. I have had convergence failures for all but random intercept models with large datasets (thousands of participants). But what really finally pushed me over the edge are repeated instances of wild swings in estimates and p-values for fixed effects given models with different random effect structures *in cases where model comparisons showed no significant differences in fit*. At that point, it's either figure out brms or succumb to Lovecraftian madness, no matter whether you eat fast food or enjoy fine statistical dining like Shravan.

Hi Shravan, rhanks for these points - I generally ...

2019-05-15T14:14:20.722-07:00

Hi Shravan, rhanks for these points - I generally agree with all three of them and appreciate the caution and the warning. Quick responses to each:

1. Yes, no question that if you have a deeply underpowered study, you are in hot water. That said, I think I might slightly disagree about the nature of the random effects issue. I typically try to run developmental studies that are *adequately* powered for my effects of interest, but are not over-powered. In that scenario the model specification does often matter quite a lot.

2. Regarding McDonalds analyses. Yes, absolutely - people should be thoughtful about their analysis! But - not all fast food is crested equal. Chipotle might be healthier than McDonalds even though both are fast. I think we want to create *good defaults* and then be thoughtful in our use of them and our deviations. Having bad defaults doesn't encourage people to be thoughtful.

3. Priors, yes. In principle we should know more about our default priors, especially for internal parameters. In practice though I do want to have good defaults (I should understand why they are good of course).

So in sum, I think if you have a truly high powered study, maybe it doesn't matter what the specification is or what the priors are. But that's not the world we live in. So we do need to choose good priors AND good (often maximal) specifications for the inferences we want to make.

Finally, I think the issue of *removing analytic flexibility* here is an important one. If we are going to make binary inferences from models, e.g. X has non-zero weight in the model, then we need to have good default workflows. Otherwise we have a lot of flexibility to try and justify our way into biases in the model that move X further from zero!

You asked me to not hold back so here goes. ö The...

2019-05-10T22:54:59.925-07:00

You asked me to not hold back so here goes. ö

There are several important things here that have been glossed over and have the potential to lead to further misuse of statistical methods.

Briefly:

- There is little to be gained by fitting a model to make a discovery claim unless you have some idea about the power and Type I, M, S error properties of your design. One must begin with power and Type I error and establish that one can in principle get accurate estimates. It has taken me forever to understand this point. If power is likely to be low, any effects you find (as in "significant" effects or Bayes factors or whatever) are guaranteed to be overestimates that will not in general replicate (Gelman and Carlin, 2014, and a recent JML paper I wrote with Gelman demonstrating the point). I feel that it's extremely damaging to make claims like "fit a maximal model" or "don't fit a maximal model" without making any qualifying statements about the capabilities of your design to---in principle---make discovery claims.

- This post is really disturbing to me for another reason: it implicitly encourages the standard McDonalds way of statistical thinking we practice, that you can drop into the stats shop and quickly leave with a complete analysis (of course it takes hours to fit a Bayesian model, but one can do it overnight while sleeping, so it still feels like a fast-food event to me). I know for sure that you personally would never work in this way (although you do say that you tend to be lazy---I know that you at least know the consequences of that, but IMO it sends the wrong message). Any newcomer reading this post is going to take away the message that one can just load a data-set and run a maximal model in brms and fertig. We should discourage this kind of magical thinking. (I have done this too, as I said I was slow to understand this stuff.) Betancourt has written extensively about this, and we tried to translate his ideas to Cognitive Science to demonstrate the point: https://arxiv.org/abs/1904.12765. The sheer pointlessness of the analyses I have done in the past myself and that I see in many papers is just depressing. We should actively discourage the idea that a quick load-and-fire approach to data fitting can get us anywhere. This is how I was taught to do analyses 17 years ago, and I am still pissed off about that.

- In the brms model, you will pay a price down the road (eventually) for using the default priors they provide. Even the author of brms warns against this. One should insist on explicitly stating the priors as that is part of the model---leaving them implicit like you did is asking for trouble. E.g., doing a Bayes factor analysis using default priors is in general the road to insanity. Also, I always do a sensitivity analysis; even if it is not in the final paper, it's in the supplementary materials. People say that priors don't matter when you have enough data. True, but for all the abstract variance components we use, priors become more and more important. I have never been in a situation where I could say I have enough power to recover all the variance component estimates accurately, and I don't even work with baby data, I can collect as much data as I like because I have the resources for it.

So, the reason I don't like this kind of argument (use brms because it allows you to fit maximal linear mixed models) is that it doesn't come with any qualifications and caveats. It encourages business as usual. No linguist doing a syntactic analysis would use automated software to come up with a syntactic derivation, but that is exactly what we are implicitly teaching students to do, except it's in statistical analysis.

I would say that asking whether to fit a maximal model or not is asking the wrong question. If you can manage to run a high-powered study, it really is not going to matter. If you are not running a high powered study, your problems lie elsewhere, not in the maximality of the model.

There. My rant is done. ö