Thursday, July 6, 2017

What's the relationship between language and thought? The Optimal Semantic Expressivity Hypothesis

(This post came directly out of a conversation with Alex Carstensen. I'm writing a synthesis of others' work, but the core hypotheses here are mostly not my own.)

What is the relationship between language and thought? Do we think in language? Do people who speak different languages think about the world differently? Since my first exposure to cognitive science in college, I've been fascinated with the relationship between language and thought. I recently wrote about my experiences teaching about this topic. Since then I've been thinking more about how to connect the Whorfian literature – which typically investigates whether cross-linguistic differences in grammar and vocabulary result in differences in cognition – with work in semantic typology, pragmatics, language evolution, and conceptual development.

Each of these fields investigates questions about language and thought in different ways. By mapping cross-linguistic variation, typologists provide insight into the range of possible representations of thought – for example, Berlin & Kay's classic study of color naming across languages. Research in pragmatics describes the relationship between our internal semantic organization and what we actually communicate to one another, a relationship that can in turn lead to language evolution (see e.g., Box 4 of a review I wrote with Noah Goodman). And work on children's conceptual development can reveal effects of language on the emergence of concepts (e.g., as in classic work by Bowerman & Choi on learning to describe motion events in Korean vs. English).

All of these literatures provide their own take on the issue of language and thought, and the issue is further complicated by the many different semantic domains under investigation. Language and thought research has taken color as a central case study for the past fifty years, and there is also an extensive tradition of research on spatial cognition and navigation. But there are also more recent investigations of object categorization, number, theory of mind, kinship terms, and a whole host of other domains. And different domains provide more or less support to different hypothesized relationships. Color categorization seems to suggest a simple model where it's faster to categorize different colors because the words help with encoding and memory. In contrast, exact number may require much more in the way of conceptual induction, where children bootstrap wholly new concepts.

The Optimal Semantic Expressivity Hypothesis. Recently, a synthesis has begun to emerge that cuts across a number of these fields. Lots of people have contributed to this synthesis, but I associate it most with work by Terry Regier and collaborators (including Alex!), Dedre Gentner, and to a certain extent the tradition of language evolution research from Kenny Smith and Simon Kirby (also with a great and under-cited paper by Baddeley and Attewell).* This synthesis posits that languages have evolved over historical time to provide relatively optimal, discrete representations of particular semantic domains like color, number, or kinship. Let's call this the optimal semantic expressivity (OSE) hypothesis.** 

What does it mean to say that linguistic representations are optimal? Language users have non-linguistic representations in a particular domain, say color space. Languages map these non-linguistic representations to discrete linguistic expressions that can be used to transmit speakers' representations of objects, events, and relations to listeners. A particular representation is more informative to the extent that it conveys a more precise estimate of the speaker's intended representation. The average informativeness of a particular set of linguistic expressions is roughly a product of the informativeness of the terms across the speakers' distribution of communicative needs. This need distribution governs how frequently particular non-linguistic representations are invoked and how precise the communication must be, for example how often you need to express particular color distinctions, or particular patterns of kinship relationships between individuals. Finally, an optimal set of linguistic representations for a particular domain should be learnable – fewer terms is good, and terms with less complex meanings is also good. 

In sum, an optimal language balances factor 1, informativeness relative to communicative need and factor 2, learnability/complexity. I take OSE to be roughly the view expressed by this chapter by Regier, Kemp, & Kay. This synthesis leads to a number of important predictions, several of which have their own names in the theoretical landscape across fields. The contribution of this post is to help me get these predictions straight, since I think they’ve been under-explored in previous work.

A brief digression first though – one that will become relevant later. For the OSE to make most of its currently testable predictions, non-linguistic representations must be shared across speakers. I'll call this the "semantic stability" assumption. That's for reasons that affect both factors in the OSE calculus. First, if speakers of English and Tagalog don't share the same underlying color space, then what's informative for English speakers (in the technical definition of informativeness) is not the same as what's informative for Tagalog speakers. So then we would need to measure non-linguistic semantic spaces in every language before we could tell what OSE predicted. And then, our predictions about cross-linguistic diversity would be based on cross-linguistic diversity – making the whole thing circular. Second, if non-linguistic representations in a semantic domain are substantially different across cultures (beyond, e.g., the distribution of communicative need, which is clearly going to be different), then we can't make the complexity/learnability predictions that OSE needs, at least not without introducing the same circularity as above. So hold onto this idea as a critical underpinning of the hypothesis.

Now on to predictions of OSE.

OSE in typology. In typology, the prediction is that languages should reflect optima in the broader space of communicative systems defined as above. There are many possible representations of particular semantic systems, but only a small number are realized in human languages. Those that we observe are very close to the optimal boundary between the two factors, informativity and learnability. This prediction has been the focus of multiple tests by Regier and colleagues have tested, most notably in the domains of color and kinship but extending to others as well, using a variety of increasingly more sophisticated operationalizations of this boundary. But semantic typology is hard – it requires work like the World Color Survey or EOSS – so experimental language evolution can be a great tool to test predictions another way.

OSE in language evolution. In a slightly different form, OSE emerges in experimental work on language evolution as well. (It's this convergence between the two literatures – noticed by participants in both, I believe – that piqued my recent interest in the topic). In iterated learning experiments, participants communicate about a particular semantic domain – often a very simple one, like objects with different shapes – repeatedly across "generations," such that the language created by one set of participants is then used to train another. Much of the work in this tradition has focused on how iterated learning can reveal underlying non-linguistic representations. But an important recent paper by Kirby et al. formalizes the idea that linguistic structures (abstractions beyond simple word-object mappings) appear to emerge from the tradeoff again between informativeness and complexity. Without this trade-off you either get degenerate languages that are easy to learn but uninformative, or very informative languages that are not easily learnable. Compositional structure only emerges when these two factors compete. 

Kirby et al. suggest that compositionality (a core feature of language) emerges from the competition of the OSE factors. So that's a big win for OSE. But it is difficult to use this set of tools to consider particular semantic domains. Typically iterated learning experiments focus on novel domains (e.g., novel objects or abstract geometric features) so as to avoid adult participants' knowledge of language biasing the process. Because of this choice, it can be hard to study whether OSE applies in any particular cognitive domain (with the notable exception of color). For that reason, people often turn to developmental research to look at how language interacts with conceptual development.

OSE in development. In development, the OSE hypothesis connects to a number of other important theoretical positions – and these links are (to my mind) underexplored. The first link is to the “core knowledge hypothesis,” the thesis that prelinguistic infants are born with evolved, domain-specific mechanisms for understanding specific content domains. While some claims of the core knowledge hypothesis are controversial, I take the general idea that we have innate domain structure for some semantic domains to be a prerequisite for OSE. For example, particular color partitions being more optimal rely on the fact that our perceptual color space is not symmetric. One point of tension here, however, is how domain-specific the constraints on representation need to be. For example, maybe the relational terms in kinship language are restricted by general constraints on compositional semantic representation rather than domain-specific machinery. But in other domains the domain-specificity claim is likely less controversial; for example, languages likely provide grammatical marking of small exact numbers (e.g., singular/plural or singular/dual/plural systems) due to our ability to perceive the exact quantity of small sets of objects but only the approximate quantity of larger sets (review). 

A second connection in development is to the “typological prevalence hypothesis“ of Gentner and Bowerman. This hypothesis is that the developmental ordering of words/partition labels in language learning should reflect their cross-linguistic prevalence. They state this (very clearly) as follows: "all else being equal, within a given domain, the more frequently a given way of categorizing is found in the languages of the world, the more natural it is for human cognizers, hence the easier it will be for children to learn." Gentner and Bowerman provide some evidence for this hypothesis in the domain of spatial prepositions, and there is also some intriguing positive evidence for color overgeneralizations

At a first glance, the typological prevalence hypothesis fits perfectly with OSE, but there is one issue. On OSE, typological distribution is the product of both learnability and informativity given need, whereas it seems to me that acquisition ordering might plausibly reflect something about learnability and the communicative need distribution, but not so much informativity. That's because there may be some things that are useful to communicate about but also conceptually hard. A hokey example of this is that the concept "ten" is quite informative and quite useful but not very learnable (you have to be pretty good at numbers to reason about ten-ness). In contrast, children can know "three" without yet knowing the meanings of the rest of the count list (they're called "three-knowers"). Yet in adult language, ten gets used as much or more than "three." So it is a bit problematic to map OSE directly to acquisition, even though components of the OSE tradeoff should be related to development.

A third connection, and one that fascinates me personally, is what I'll call the “pragmatic overextension hypothesis.” The idea here is that children's use of words in individual semantic domains is related to what competitors they have in those domains. For example, if you have a word for blue but no word for green, you might be more likely to overextend "blue" based on the absence of a competitor. This hypothesis is described glancingly in a wonderful chapter by Wagner, Tillman, and Barner in which they discuss the relationship between core knowledge and the acquisition of language for complex conceptual domains like number, time, and color. The evidence for pragmatic overextension is probably strongest in the domain of color, where Wagner's work showed pragmatic overextension of meanings before other competing words are learned. There is a lot left to do to test this hypothesis, but if it is true in other domains as well, it suggests that data on children's use of words describing semantic domains is a function not just of their semantics but also their pragmatics (like Eve Clark's classic work on referential overextension) – further complicating the typological prevalence hypothesis. 

Finally, here's a potential challenge to OSE from development. To the extent that there is "Quinian Bootstrapping" in a domain – a representational discontinuity in which a new representational system is created – this domain may violate a core tenet of what makes OSE work. Remember above, when I said that non-linguistic representations needed to be constant for OSE to make testable predictions? This is precisely the assumption that might be violated by bootstrapping. If by bootstrapping all we mean is creating new linguistic representations then that's fine. But if there are new non-linguistic representations, then all of the arguments above about semantic stability go through. If you can create a new non-linguistic representation in a particular language community, then what's informative is different in that community, and so OSE should make different predictions about what words that community needs, etc. The primary example of Quinian Bootstrapping developed by Susan Carey in her book is number knowledge, and this is linguistic bootstrapping (at least that's my argument; some people deny that it's bootstrapping entirely), so that kind doesn't violate semantic stability. But any examples of non-linguistic Quinian bootstrapping – if those are possible! – would be important problems for OSE.

OSE in cross-linguistic cognition. I reviewed the theoretical landscape on cross-linguistic variation and Whorfian effects in my previous post. The "thinking for speaking" idea – that cross-linguistic differences are more apparent in more linguistically- and communicatively-demanding tasks – is quite consistent with OSE. The idea is simply that the "core knowledge" (shorthand for whatever underlying non-linguistic semantic representation we have) is best revealed by non-linguistic tasks, while we tap our variable linguistic representations in more communicative tasks.

In contrast, a stronger and more permanent Whorfian hypothesis – the most viable version of which I called the "habits of mind" hypothesis in my previous post – challenges the OSE quite directly, for the same reasons as non-linguistic Quinian bootstrapping above. I take "habits of mind" to be the claim that continuous use of particular linguistic coding may lead to changes in non-linguistic representations – so, the more you talk using cardinal directions, the better you get at tracking them in general, even in non-linguistic tasks. This pattern doesn't fit OSE. If the underlying non-linguistic representation is altered by practice with language, then again, semantic stability is violated.

Maybe this connection is obvious, but I don't think so. In fact, I missed it almost entirely in my previous post, where I sloppily drew an arc from "habits of mind" to functionalist explanations like OSE. So, let me say it again: If strong Whorfian conclusions are right for a semantic domain, then the predictions of OSE for that domain are not the same as those assumed by the standard typological tests. To the extent that we think that navigation practice really does change people's spatial representations beyond language, we should assume that the typological distribution of large-scale spatial language will be odd from a received OSE perspective. That's because if you practice cardinal directions (which many people find difficult, and hence not easily learnable) they should actually become both more informative and more learnable. And that fact should imply that there is a stable equilibrium of cardinal direction languages that is not predicted by the pattern of data from speakers of non-cardinal direction languages. E.g., on a Kemp and Regier-style approach, cardinal direction languages should be over-represented. Clearly this line of thinking is just speculation, but it seems like an interesting direction for the future.***

Conclusions. OSE is one of the most the most exciting big ideas in Cognitive Science (and let me reiterate again that I'm not taking any credit for it myself!). But as it's emerged, I personally haven't understood how it interacted with other important parts of the literature on language and thought – for example, its critical relationship with the semantic stability assumption. In development, in particular, more work is needed to understand whether OSE contributes meaningfully to describing developmental change.

* One possible source for this hypothesis, given by Regier, Kemp, & Kay is a chapter by Rosch (1978/1999), where she writes that "task of category systems is to provide maximum information with the least cognitive effort." I love this Rosch chapter, but actually wonder if this is little too generous a citation – the form of the claim is right but the content is a little different. In my reading, Rosch is talking about cognitive efficiency, not communication and communicative efficiency; the link to communication here is critical for understanding the mechanism.

** Two notes on OSE. First, I’ve put the word “optimal” in the name of OSE, which some migth not agree with. Optimal here for me is a shorthand for “reflecting an approximation to the normative distribution.” Just as in claims about optimality in cognition, where every subject need not be optimal on every trial, OSE doesn’t need to claim that every language is optimal, only that the distribution of languages approximates the optimal one. Second, OSE is a functionalist hypothesis, in the sense that it proposes that language emerges from (among other things) its efficiency for communication. It's maybe a little weird to call it functionalist in that the semantic variant of OSE that I'm writing about is mostly concerned about lexical systems (color words, spatial prepositions, etc.) and not specific syntactic rules or constructions. Many people working in this tradition seem to assume in some moments that the same arguments go through for syntax, but that's mostly an article of faith right now, I think.

*** The reverse inference here is also kind of interesting, which is that if you can't predict typological distribution from non-linguistic informativity and learnability, then you should assume a strong Whorfian claim. That's a pretty cool prediction, but in practice I think you'd be on thin ice making it...


  1. Thank you for this useful summary Michael.

    I agree with you that (what you call) the OSE is insightful and explanatorily important, and the results you summarise do suggest that it is true. I am certainly of that view myself.

    Let me make two further points. Both broaden the issue, one theoretically, the other empirically.

    Theoretically, the OSE is / can be read as a corollary of the cognitive principle of relevance, as describe by Sperber & Wilson in Relevance: Communication & Cognition (1986/1995). The cognitive principle of relevance states that the human mind is geared towards the maximisation of relevance, where relevance is defined as the trade-off between cognitive effects on the one hand, and cognitive effort on the other. The OSE is effectively a statement that, in the domain of semantic representation, this maximisation scales up from the individual to the group/cultural level.

    I realise, of course, that Relevance Theory (RT) is not the only framework compatible with the OSE, but nevertheless the link between RT and the OSE seems worth mentioning, partly because it isn't widely recognised/appreciated, and partly because it is particularly tight and clear.

    On the empirical side, I don't see any reason to limit the hypothesis to semantics. I don't see why it wouldn't also apply to morphosyntax, phonology, and other grains/levels of linguistic analysis. I'd be minded to agree that semantics is the domain for which we have the strongest/best evidence (which you summarise), but there is evidence in these other domains too. The last Evolang conference had *lots* of presentations of this sort.

  2. Thanks for the great post Mike!

    Some thoughts.

    First, merely raising the issues you raise is already an improvement over the run-of-the-mill does-language-shape-thought kinds of questions.

    Here’s my own take on the language-and-thought connection and a few reactions to the semantic expressivity hypothesis.

    Learning a language is a form of experience. Many experiences have wide-ranging effects beyond a specific domain. Especially when dealing with an experience as ubiquitous as language, a question arises: what sorts of effects does learning a language have on X where X is something outside of language (though what should count as X creates some interesting circularities; how do we know whether something is “linguistic” or not?).

    Let’s take color as a test-case. As an aside, papers on language and color and language frequently talk about supporting or rejecting the Whorfian hypothesis including in the very title, even though—as far as I know as far as I know Whorf never wrote about color or even what we would now call visual perception.

    Here’s how I think of the situation.

    The human visual system is capable of distinguishing colors. That fact is independent of language or communication, though the role of visual diet in color discrimination remains an open question. I think this apparent independence between color vision and language is why there is confusion in the color psychophysics community regarding what the language people are talking about when they talk about the relationships between language and color perception.

    In the course of learning some languages, children learn to name colors. Being a fluent English speaker means knowing how to name at least the basic colors. And so we can ask: what effect is there of learning and using color names on putatively non-linguistic tasks: color perception/discrimination/memory/etc.

    We can also ask the question a slightly different way: what effect does language have on non-linguistic color representations, but for reasons that will become clear in a moment, this way of asking the question carries some strong assumptions that may be unfounded.

    As a first step toward answering the question, I find it helpful to think about how the treatment of color by languages is different from the treatment of color by the visual system. The immediate answer is that the visual system appears to code color in a continuous space (though nonlinear and nonmonotonic). In contrast, languages code colors categorically. And so one hypothesis is that in learning a color naming system, people are learning a means of activating color representations in a (more) categorical way than they would in the absence of such a system. This is not to say that language is the only route to categorical color, but it is the one that people using languages with color names routinely take. More than simply “carving up” the color space (the focus of nearly all the work on the subject), color names alert the learner to the idea that the color dimension itself is a nameable entity. The idea that the color of something can be named independently of the object is an interesting cultural innovation. It is analogous in some respects to the idea that cardinality can be named independently of the objects being enumerated. You mention in the post the often-raised question of how much participants in Kirby et al’s iterated-learning tasks depend on prior knowledge of language. The use of hard-to-name shapes doesn’t take into account that the participants have all learned to treat the typical *dimensions* of variation (e.g., color, shape, number) as nameable entities. The question of the role that language played in that achievement is an interesting one.

    [running into max comment length, so splitting into 2 comments]

  3. Now to relate these ideas to the semantic expressivity hypothesis (OSE).

    Sticking with the domain of color, I take Regier et al’s work (also work by Baronchelli et al., and earlier work by Batali) as all showing that the shape of lexicalized color categories is not arbitrary but rather reflects an optimal or near-optimal solution for some task. I like this line of thinking, but the question that immediately arises is what’s the task. Is the task to communicate about specific color hues? What determines the alternatives being discriminated? What determines the base frequencies of the alternatives?). In any case, I am fully on board with the idea that the geometry of color categories (as well as shape categories, spatial categories, etc.) is predictable and constrained by, for example, the need for the categories to be learnable and expressive. Though the latter metric is difficult to assess for the reasons already mentioned: express what exactly, to whom, and under what conditions? That caveat aside, when it comes to explaining why words generalize in the ways they do (i.e., the shape of word meanings), a focus on optimizing learnability and expressivity strikes me as correct.

    But now back to thinking about the effects of language on color cognition/perception. The questions addressed by the OSE hypothesis seem orthogonal to two central questions of most interest to people like me: First, even if one could entirely predict the geometry of a 5-color category naming system, the question still remains why languages differ in color names as much as they do. What exactly drives a language to develop color words in the first place. Second, we return to the question I started with. What is the consequence of learning color words? A child learning English is inheriting an already developed system—the English color vocabulary. What (if any) is the consequence of learning and using this system on tasks like color memory, color discrimination, color-based attention, just-noticeable-differences, etc.

    Lastly, I a quick thought on this bit:

    “For the OSE to make most of its currently testable predictions, non-linguistic representations must be shared across speakers.”

    I agree, but it’s really tough to test this if we allow the possibility that language warps “non-linguistic” representations. If using color-labels makes color representations more categorical, then in what sense can one claim that color representations of speakers of languages with different color vocabularies are shared? One could point to some task that both populations perform identically, e.g., that they have identical just-noticeable differences. But what is to say that this task is a better measure of “true” non-linguistic representations than another task on which their performance differs? An alternative is to drop the distinction between linguistic and non-linguistic representations entirely and just talk about “representations” – which may or may not be affected by experience with language and/or in-the-moment use of language.

    Thanks again for a great post!