tag:blogger.com,1999:blog-4297242917419089261.comments2023-04-21T00:17:17.878-07:00Babies Learning LanguageMichael Frankhttp://www.blogger.com/profile/00681533046507717821noreply@blogger.comBlogger191125tag:blogger.com,1999:blog-4297242917419089261.post-2540345893632599912021-02-20T14:26:53.497-08:002021-02-20T14:26:53.497-08:00Hi Mike, in case you haven't seen this, here&#...Hi Mike, in case you haven't seen this, here's a paper and a poster making a complementary point to yours (from 2017, but I only discovered them now!) <br />https://arxiv.org/ftp/arxiv/papers/1701/1701.04858.pdf<br /> http://publish.illinois.edu/quantitativelinguistics/files/2017/01/LSA2017.Mixed-Models-are-Sometimes-Terrible-final.pdf<br /><br />One point I find compelling from personal experience is that more data isn't necessarily going to solve convergence and model selection problems in lme4. I have had convergence failures for all but random intercept models with large datasets (thousands of participants). But what really finally pushed me over the edge are repeated instances of wild swings in estimates and p-values for fixed effects given models with different random effect structures *in cases where model comparisons showed no significant differences in fit*. At that point, it's either figure out brms or succumb to Lovecraftian madness, no matter whether you eat fast food or enjoy fine statistical dining like Shravan.Romanhttps://www.blogger.com/profile/05186610310709906319noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-67606699244818349692019-10-12T10:47:20.873-07:002019-10-12T10:47:20.873-07:00Thank you for your easy-to-read blog post! I didn&...Thank you for your easy-to-read blog post! I didn't know that it takes a strong correlation for it to make a difference on the precision of the estimate.Alexhttps://www.blogger.com/profile/12826710899937774963noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-79654383814456742092019-05-15T14:14:20.722-07:002019-05-15T14:14:20.722-07:00Hi Shravan, rhanks for these points - I generally ...Hi Shravan, rhanks for these points - I generally agree with all three of them and appreciate the caution and the warning. Quick responses to each:<br /><br />1. Yes, no question that if you have a deeply underpowered study, you are in hot water. That said, I think I might slightly disagree about the nature of the random effects issue. I typically try to run developmental studies that are *adequately* powered for my effects of interest, but are not over-powered. In that scenario the model specification does often matter quite a lot. <br /><br />2. Regarding McDonalds analyses. Yes, absolutely - people should be thoughtful about their analysis! But - not all fast food is crested equal. Chipotle might be healthier than McDonalds even though both are fast. I think we want to create *good defaults* and then be thoughtful in our use of them and our deviations. Having bad defaults doesn't encourage people to be thoughtful. <br /><br />3. Priors, yes. In principle we should know more about our default priors, especially for internal parameters. In practice though I do want to have good defaults (I should understand why they are good of course). <br /><br />So in sum, I think if you have a truly high powered study, maybe it doesn't matter what the specification is or what the priors are. But that's not the world we live in. So we do need to choose good priors AND good (often maximal) specifications for the inferences we want to make. <br /><br />Finally, I think the issue of *removing analytic flexibility* here is an important one. If we are going to make binary inferences from models, e.g. X has non-zero weight in the model, then we need to have good default workflows. Otherwise we have a lot of flexibility to try and justify our way into biases in the model that move X further from zero! Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-31313119080437343312019-05-10T22:54:59.925-07:002019-05-10T22:54:59.925-07:00You asked me to not hold back so here goes. ö
The...You asked me to not hold back so here goes. ö<br /><br />There are several important things here that have been glossed over and have the potential to lead to further misuse of statistical methods. <br /><br />Briefly:<br /><br />- There is little to be gained by fitting a model to make a discovery claim unless you have some idea about the power and Type I, M, S error properties of your design. One must begin with power and Type I error and establish that one can in principle get accurate estimates. It has taken me forever to understand this point. If power is likely to be low, any effects you find (as in "significant" effects or Bayes factors or whatever) are guaranteed to be overestimates that will not in general replicate (Gelman and Carlin, 2014, and a recent JML paper I wrote with Gelman demonstrating the point). I feel that it's extremely damaging to make claims like "fit a maximal model" or "don't fit a maximal model" without making any qualifying statements about the capabilities of your design to---in principle---make discovery claims. <br /><br />- This post is really disturbing to me for another reason: it implicitly encourages the standard McDonalds way of statistical thinking we practice, that you can drop into the stats shop and quickly leave with a complete analysis (of course it takes hours to fit a Bayesian model, but one can do it overnight while sleeping, so it still feels like a fast-food event to me). I know for sure that you personally would never work in this way (although you do say that you tend to be lazy---I know that you at least know the consequences of that, but IMO it sends the wrong message). Any newcomer reading this post is going to take away the message that one can just load a data-set and run a maximal model in brms and fertig. We should discourage this kind of magical thinking. (I have done this too, as I said I was slow to understand this stuff.) Betancourt has written extensively about this, and we tried to translate his ideas to Cognitive Science to demonstrate the point: https://arxiv.org/abs/1904.12765. The sheer pointlessness of the analyses I have done in the past myself and that I see in many papers is just depressing. We should actively discourage the idea that a quick load-and-fire approach to data fitting can get us anywhere. This is how I was taught to do analyses 17 years ago, and I am still pissed off about that. <br /><br />- In the brms model, you will pay a price down the road (eventually) for using the default priors they provide. Even the author of brms warns against this. One should insist on explicitly stating the priors as that is part of the model---leaving them implicit like you did is asking for trouble. E.g., doing a Bayes factor analysis using default priors is in general the road to insanity. Also, I always do a sensitivity analysis; even if it is not in the final paper, it's in the supplementary materials. People say that priors don't matter when you have enough data. True, but for all the abstract variance components we use, priors become more and more important. I have never been in a situation where I could say I have enough power to recover all the variance component estimates accurately, and I don't even work with baby data, I can collect as much data as I like because I have the resources for it. <br /><br />So, the reason I don't like this kind of argument (use brms because it allows you to fit maximal linear mixed models) is that it doesn't come with any qualifications and caveats. It encourages business as usual. No linguist doing a syntactic analysis would use automated software to come up with a syntactic derivation, but that is exactly what we are implicitly teaching students to do, except it's in statistical analysis. <br /><br />I would say that asking whether to fit a maximal model or not is asking the wrong question. If you can manage to run a high-powered study, it really is not going to matter. If you are not running a high powered study, your problems lie elsewhere, not in the maximality of the model.<br /><br />There. My rant is done. öShravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-83341638450603656142019-04-11T13:26:33.494-07:002019-04-11T13:26:33.494-07:00Do you release the entire source Rnw or Rmd file t...Do you release the entire source Rnw or Rmd file that is the paper publicly? We create a simpler vignette that allows a quick walk through the main code chunks. Also, we can *never* get the Rnw or Rmd file to compile correctly across collaborators' computers. Someone among the authors always has a compile failure. So we felt it was unlikely that an outsider can compile our source files.Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-21927984201577948222019-04-10T10:50:47.001-07:002019-04-10T10:50:47.001-07:00Hi Shravan, We don't have anything too formal,...Hi Shravan, We don't have anything too formal, but we have gotten in the habit of doing a reproducibility check (someone else can build the paper from the repository) prior to submission for many papers. That's been really helpful. Formal code review would be a great next step! Please do share your plan if you get one!Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-38002675276667516722019-04-09T21:52:22.912-07:002019-04-09T21:52:22.912-07:00Hi Michael, do you have a formal data management p...Hi Michael, do you have a formal data management plan that guides the workflow of data release? We have been developing one. <br /><br />I have benefitted a lot from releasing data and code. In one case, I gave the data and code of an unpublished paper to someone who had a stake in the results, and he found a mistake in our code (a contrast coding reversal of sign) that we were able to fix before submitting the paper. This has happened during a review process too, when a reviewer looked at our code. In my estimation it's basically impossible to work on a big project without introducing errors, at least small ones, in one's code. <br /><br />It would be cool if one could convince one's lab members to act as adversarial code reviewers for our manuscripts; if one had an exchange of favors type of setup, there could be a lot of value added. In practice, nobody has the time to review code for someone else because no credit is given. It would be nice if some kind of credit system could be created.Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-58839506802125321242019-04-08T21:23:37.381-07:002019-04-08T21:23:37.381-07:00Great post. Sharing with my lab!Great post. Sharing with my lab!zacharyhornehttps://www.blogger.com/profile/08016198924699754355noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-23209105106097891042019-02-18T08:32:09.694-08:002019-02-18T08:32:09.694-08:00J seemed to do this a little earlier, getting inte...J seemed to do this a little earlier, getting interested in swiping an object at around 6 weeks, now very interested and cooing at hanging toys in his bassinet at 7 weeks. Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-56049685991775759902018-08-22T13:41:33.501-07:002018-08-22T13:41:33.501-07:00Several folks including Roman Feiman have brought ...Several folks including Roman Feiman have brought up Fodorean issues. Just to respond briefly to this point: what I'm arguing is that children have a fully expressive representational system (in the sense that it contains the primitives to construct any computable function). But constructing operators like logical OR or NOT might be difficult because they would have to be constructed *out of* simpler social parts. For example, here's a caricature OR is "offer, but only with respect to the propositional truth of the arguments, not whether you can own them." That's a description of OR in a language that doesn't have it as a primitive; it's truth-functionally equivalent to OR but representationally more complex. Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-52745202555933046032018-08-22T13:30:22.561-07:002018-08-22T13:30:22.561-07:00Thanks Tomer, sorry I missed this! I agree complet...Thanks Tomer, sorry I missed this! I agree completely about bargaining, though in my household it rarely contains the word "if." More like:<br /><br />Parent: "bedtime!"<br />Child: "five more minutes!"<br />Parent: "two more."<br />Child: "no three!"<br />Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-91253764504036688222018-08-11T11:02:05.756-07:002018-08-11T11:02:05.756-07:00This is most interesting and there's a lot to ...This is most interesting and there's a lot to think about. <br /><br />One small thing which you've probably considered, and I know you didn't mean these particular social situations as the only ones: Asides from 'threat' as an initial social mapping for conditional-If, there's also 'bargaining/exchanging'.<br /><br />Bargaining certainly seems to make up a lot of parent-child speech (anecdotally), and is full of non-threatening conditionals like "If you finish your broccoli you can have dessert / if you give me the truck you can have the duck / if he's first on the slide I get the swing". This is close in spirit to the first conditional you outlined ("If you finish early..."), but I would call that one also bargaining -- exchange(X,Y). Maybe. Anonymoushttps://www.blogger.com/profile/00256944681310329884noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-63338279722522737562018-05-08T04:07:33.071-07:002018-05-08T04:07:33.071-07:00I haven't read Ferber, but is it possible he&#...I haven't read Ferber, but is it possible he's referring to operant conditioning (behavior/reinforcement) rather than classical (stimulus/response)?Paul Brannanhttps://www.blogger.com/profile/01608027891457305677noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-75133966563011399832018-03-23T09:28:03.975-07:002018-03-23T09:28:03.975-07:00I didn't know about this! Thank you. Very help...I didn't know about this! Thank you. Very helpful. Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-29366770640978650342018-03-23T00:57:08.141-07:002018-03-23T00:57:08.141-07:00You probably know these already, but here are a co...You probably know these already, but here are a couple of newish implementation of the model in stan, which work pretty fine (tho' or maybe because btw trial variability is not included):<br />- http://singmann.org/wiener-model-analysis-with-brms-part-i/<br />- https://github.com/cran/hBayesDM/blob/master/inst/stan/choiceRT_ddm.stanRiccardohttps://www.blogger.com/profile/13675131893260129288noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-20257146944035466782018-03-12T16:09:11.552-07:002018-03-12T16:09:11.552-07:00When you remove compilation time, brms will be fas...When you remove compilation time, brms will be faster than rstanarm on almost any multilevel model, because the Stan code can be hand tailored to the input of the user. For any non-trivial multilevel model, estimation will take a few minutes, and at the time frame brms will usually already be faster even when including compilation time. Why do people continue to think the likelihood is implemented in a more optimal way in rstanarm?Anonymoushttps://www.blogger.com/profile/04334288635765014921noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-36475391565950598342018-03-12T16:07:16.673-07:002018-03-12T16:07:16.673-07:00This comment has been removed by the author.Anonymoushttps://www.blogger.com/profile/04334288635765014921noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-55146540047788801712018-03-02T20:50:01.525-08:002018-03-02T20:50:01.525-08:00That would be great (collaboration)! We have two c...That would be great (collaboration)! We have two completed meta-analyses, and we are working on a third one. Why not visit us in Potsdam sometime this year, and we can talk and plan this out? If this becomes more widespread in psycholx, people will adopt the standards you folks have developed.Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-70378519974615433782018-03-02T13:58:50.164-08:002018-03-02T13:58:50.164-08:00Henrik, sorry I didn't realize your contributi...Henrik, sorry I didn't realize your contributions on bridgesampling (writing fast)! This is very helpful detail on the technical issues as well. I will have to read more on this, thank you!Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-2824323257401642862018-03-02T13:22:18.580-08:002018-03-02T13:22:18.580-08:00The bridgesampling R package developed by Quentin ...The bridgesampling R package developed by Quentin Gronau, myself, and E.J. Wagenmakers (https://cran.r-project.org/package=bridgesampling) indeed allows to calculate Bayes factors for all models fitted in Stan 'without tears'. It also works directly with models fitted with brms and rstanarm. However, IMHO these packages currently do not allow to specify priors that are appropriate for Bayes factor based model selection. So whereas I agree that we have solved the technical problem of how to get Bayes factors, the conceptual problem of specifying adequate priors still remains. But I hope there will be progress on this front this year.<br /><br />Furthermore, even when only interested in estimation or measurement, there exists a somewhat subtle issue related to categorical covariates with more than two factor levels in a Bayesian setting. For most coding schemes, the priors do not have the same effect on all factor levels. For example, for contr.sum the prior is more diffuse for the last compared to all other levels. For contr.helmert the prior becomes gradually more diffuse for each subsequent factor level with the exception of the first two. Thus, to make sure that the results are not biased and depend on which factor level is 'first' or 'last', a different type of coding scheme must be used. One such orthonormal coding scheme was developed by Rouder, Morey, Speckman, and Province (2012, JMP) and, AFAIK, is the one used internally by the BayesFactor package. Henrik Singmannhttps://www.blogger.com/profile/02531881768360426790noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-73425191513558747272018-03-02T10:46:57.279-08:002018-03-02T10:46:57.279-08:00Thanks - let me know if you would like to collabor...Thanks - let me know if you would like to collaborate on this. In principle, it would not be too difficult to import and visualize data of a different type. The difficulty is really in getting the meta-analytic data! Each of the MAs for development is essentially a full, labor-intensive paper being written by one of the collaborators...Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-63620702177272228852018-03-02T10:42:28.908-08:002018-03-02T10:42:28.908-08:00yes, sorry, i was actually thinking of Barr's ...yes, sorry, i was actually thinking of Barr's comment on twitter when I wrote this, which suggests that's the only reason:<br /><br />https://twitter.com/dalejbarr/status/969573499349159941<br /><br />But it's twitter, I can't hold it against Dale if he didn't mean that either.Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-74041828691516105362018-03-02T10:40:51.860-08:002018-03-02T10:40:51.860-08:00yeah, that metalab thing is just simply amazing. w...yeah, that metalab thing is just simply amazing. why isn't this the norm now in all fields? i want to do something similar with interference studies, but need to find some time to create such an online tool. Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-73700474165842993532018-03-02T10:00:12.145-08:002018-03-02T10:00:12.145-08:00Rant away, Shravan! This is my cause too.
My lab...Rant away, Shravan! This is my cause too. <br /><br />My lab has taken a number of steps to try and work on these issues. First, we preregister the key analyses for essentially every study we do. This is critical for frequentist stats, but I believe it's important for avoiding this inflation issue regardless of estimation method. Second, we have been actively working through projects like metalab (http://metalab.stanford.edu) and manybabies (http://manybabies.stanford.edu) to compare the state of the literature to unbiased, large-scale estimates of key effects.Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-23848189937418603572018-03-02T09:56:13.143-08:002018-03-02T09:56:13.143-08:00To be honest, I wasn't sure whether you are am...To be honest, I wasn't sure whether you are among the group of psychologists I am referring to. I'm relieved to hear you are not! But, if you do care what the magnitude of the effect is and not just that it is significant, then if power is low, it simply doesn't matter what model one fits, maximal, minimal, whatever. This is because any significant effect under a low power study is *guaranteed* to be an overestimate (I mean 100% probability of being 2-7 times larger than the true effect). Why would anyone care to publish that in a top journal as big news? Yet, that is exactly what Cognition and JML routinely publish. I elaborate on this point about the great need to focus on whether the effect is accurately estimated, and the importance of paying attention to the imprecision of the estimate, in this paper: https://psyarxiv.com/hbqcw. I did fit maximal models throughout :)<br /><br />Given that most published studies in psycholx (I don't know about child language acquisition) are heavily underpowered (see Appendix B of http://www.ling.uni-potsdam.de/~vasishth/pdfs/JaegerEngelmannVasishthJML2017.pdf for interference studies), I don't know why anyone even cares about whether the model is maximal or not. Whatever we are finding out from the data is misleading us either by giving us an exaggerated effect size, or the wrong sign, or both? I wish the authors of the Keep it Maximal paper had sent a high impact message like Keep it Powerful. Right now, people are continuing to run underpowered studies and publishing them in Cognition and JML, and worrying about whether their model is maximal or not when lack of power is the real problem. Meehl and Cohen came and went, and had no impact on psychology.<br /><br />I know you probably know all this, I am just engaging in a start-of-weekend rant. Shravan Vasishthhttps://www.blogger.com/profile/05926656325558456592noreply@blogger.com