tag:blogger.com,1999:blog-4297242917419089261.post4051186174029058745..comments2023-03-03T03:33:05.224-08:00Comments on Babies Learning Language: Descriptive vs. optimal bayesian modelingMichael Frankhttp://www.blogger.com/profile/00681533046507717821noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-4297242917419089261.post-20490190065111430562015-10-07T10:00:27.049-07:002015-10-07T10:00:27.049-07:00MH, thanks for the comments.
My take is that &qu...MH, thanks for the comments. <br /><br />My take is that "optimal" here refers to "optimal with respect to some natural task," as in some versions of Marr's Computational Theory level of analysis, or as in rational analysis. The sense of optimality you're talking about is "optimal inference with respect to the model definition." Confusion between these two is a source of much stress and conflict, IMO. <br /><br />I see TNPS as saying, let's give up on that first sense of optimal, since (as you point out) arguments that a particular prior is exactly right with respect to some environmental task can be both pretty flimsy and unnecessarily constraining of the data analyst. Michael Frankhttps://www.blogger.com/profile/00681533046507717821noreply@blogger.comtag:blogger.com,1999:blog-4297242917419089261.post-67834355733444207452015-10-03T13:59:32.510-07:002015-10-03T13:59:32.510-07:00Very cool stuff, Mike. Thank you for posting this....Very cool stuff, Mike. Thank you for posting this. <br /><br />I am a little bit unclear about all this optimality business, and it may be own naivety of the history of the literature, what I’ve heard described by J. B. Tenenbaum as “philosophical baggage” and related things. I thought bayesian models (descriptive, optimal, or otherwise) were always “optimal” w.r.t. a prior and a likelihood. That is, Bayes Rule gives you the optimal way to combine these two sources of information. This view may be a very weak optimality claim (maybe that evolutionary psychologists wouldn’t get inspired about), but it seems that it is always present with a bayesian model. What then is characteristic about the “optimal models” as described by TNPS? The argument seems to rest on what is going into the prior and likelihood. <br /><br />I find it easier to think about the priors so I’ll start there. Take Case Study 2: The optimal/descriptive distinction of TNPS seems to rest on the question of “what are the priors?” with the possible answers being (1) environmental (optimal), or (2) non-environmental (non-optimal). They find that (2) is mostly the case, but (1) isn’t terrible. The distinction between optimal/non seems to rest on “are the priors optimal”, not “is the reasoning optimal”. I don’t yet find this distinction of optimal vs. non optimal priors compelling. Do we have criteria to tell whether priors are optimal? In psychology, it seems that the priors being perfectly aligned with environmental statistics are conceivably not optimal. For example, is it optimal to include infant mortalities into your beliefs about lifespan, or might you give special status to infant mortalities, reserving those beliefs their own distribution? This gloss on the optimality question seems to be removed from the empirical landscape and more appropriate for philosophical quarters.*<br /><br />The case that “everything is (relative to some prior, likelihood) optimal” is perhaps a little more nuanced in the case of modifying the likelihood. I really like their Case Study 1 approach of discounting evidence / modifying the likelihood (and your gloss relating it to memory is also very interesting). They show that psychokinesis information effectively requires more evidence to produce the same updating as the genetics information. But given that the updating is discounted in that way, the incorporation of that discounted evidence with the prior is still optimal in the sense of how these information sources are combined.**<br /><br />In the end, I think back to the work on subjective randomness, where there’s a very clear case that people are not optimal with respect to veridical statistical reasoning (and its corresponding generative process e.g. flipping a coin) but seem to be optimal with respect to some lay theory of how the data could have been generated and what the experimenter’s question is really asking (random vs. non-random generative process). <br /><br />I think “descriptive Bayes” as TNPS put it is methodologically superior and a more tractable way of doing science. I still think there is optimality in there, perhaps a weaker optimality than implicated in the early Bayesian literature.<br /><br />MHT<br /><br />*Also on priors: TNPS says the optimality question doesn’t apply to the hypothetical priors of future lifespans, but I think there is still an optimality question: Given beliefs about future lifespans, and the likelihood function specified, are the inferences optimal?<br /><br />** What is really cool about the TNPS approach is that it brings light to the “discounted updating” phenomenon, which raises the question of “why?” It’s quite conceivable the likelihood function is different as the result of a different lay theory about the information sources (e.g. psychokineticians are more likely to be fraudulent in reporting their results than geneticists).<br />Anonymoushttps://www.blogger.com/profile/05996276521920888671noreply@blogger.com