Babies Learning Language: August 2013

Tuesday, August 27, 2013

Valence/arousal in babies

The valence/arousal model, from Russell (1980). Higher up is more aroused, further right is more positive valence.

Watching my daughter M, who is now six weeks old, makes me think that valence and arousal are much more tightly coupled in young infants than in older children and adults. Let me explain what I mean.

The valence/arousal model is a simple, powerful way of thinking about the spectrum of human emotions. Valence describes whether the emotion is positive or negative, while arousal describes the level of alertness or energy involved in the emotion. The original scaling of emotion words is shown in the graphic above. For example, you can see that fear, anger, and distress are high arousal emotions with generally negative valence (upper left corner) while excitement and delight are generally high arousal emotions with positive valence (upper right corner).

In the last few weeks, M has been starting to smile and coo. This very cute behavior is happening mostly right after she wakes up. She will be awake and alert and very smiley, and I'll play with her, tickling her stomach or holding her hand. After a few minutes of this, though, she can very quickly cross the line into overstimulated and fussy. Her smile will turn into a frown almost instantaneously. This flip happens in the other direction as well: she can be starting to wail from hunger, and if I bounce her and talk to her for a minute she will suddenly smile at me (before remembering that she is hungry and scrunching up her face again). In other words, when she's aroused, she switches between positive and negative valence very quickly.

So perhaps M's dimensions of valence and arousal just aren't as well-differentiated. I tried to make my own version of Russell's diagram (below):

There's essentially just a single dimension of variance, which is mostly based on arousal: asleep being lowest and wailing being highest. Between this there is some variation on whether the middles states are generally positive (e.g. happy satiated look after eating) or negative (squirming and slightly uncomfortable), but they can flip between one another very quickly.

I've been trying to think about how to test this model, but I don't have any good ideas yet. Nevertheless, it certainly seems like it captures my intuitions about M's emotional states so far...

Thursday, August 22, 2013

Seeing in the first month, part 1

(Home-made face perception stimuli.)

M just had her one-month birthday. As a reader of papers on early cognitive development, I'm used to thinking that babies like M are all "newborns" - an unexamined category that includes at least the first month and perhaps beyond. But it turns out a tremendous amount has changed in that short month... in these two posts, I'll talk about a few things about M's visual perception that have been changing.

Visual acuity

Parents of newborns are routinely told that their baby can only focus on objects 8 - 15 inches away. But this turns out not to be true. Research by Tony Norcia and colleagues (nicely summarized here) suggests that infants' visual acuity is considerably sharper than initially thought. Early research by Fantz and others used infants' visual preference for contrast to test their acuity: if they would attend to stripes with spatial frequency X then they must be able to perceive spatial frequency X.

This work was incredibly clever, but it required infants not only to be able to see the stimulus but also to have a preference for it and to direct their eyes to it consistently. Neither consistent preference nor sustained attention are newborns' strong suit, however. As a consequence, this work significantly underestimated their acuity. In fact, Norcia and colleagues estimate newborns' acuity to be closer to 20/120 in the first month, using passive brain measures that only require perception, not preference or attention. So newborns' acuity isn't great, but at least they aren't legally blind.

Eye-movements

Adults' eyes are constantly in motion, making saccades directly from from one location to another around two to four times per second. Infants' ability to make fast saccades matures rapidly, although
early researchers observed that very young infants sometimes make what look like tiny "microsaccades" along the way to a target, rather than jumping there directly.

My little experiment with M has been to hold her up at arm's length facing me. Once she's there I get her attention and move her about 20 or 30 degrees to the right, and then to the left, observing whether her eyes follow my face. In the first week or so I saw a lot of these little tracking microsaccades. But by around 14 days she would make a single tracking saccade to fixate my face again after I moved her. These saccades took quite a while to plan and execute: I didn't time them but it seemed like it took a solid 500 - 1000 ms to look at me. This was really striking: Move. Pause. Look.

Now at a month old, I can do the same trick when M is in her chair and I'm three or four feet away from her. And she already seems quite a bit faster - though still far from the instantaneous adjustment I'd expect even a few months down the road.

A preference for contrast

Even in the first week we noticed that M seemed to love the venetian blinds above her changing table. Her preference is likely driven by a preference for high contrast. The blinds are spaced pretty far apart, so the spatial frequency is low and the contrast is high because of the light behind them. I had talked about this phenomenon in a paper on babies' visual preferences, but it's amazing to see in person. By 4 weeks, when she's fussy, we can calm her down by holding her a couple of feet from the blinds in our dining room.

Although my saccade tracking exercise above happens to involve a saccade to a face, it doesn't necessarily say anything about face preferences per se. In that situation, my face is also a high-contrast target on a boring background (usually the ceiling). At one point I tried to show M's new trick to my wife and inadvertently set M up in a situation where there was a lightbulb behind me. I moved her to the left, and she stared at the lightbulb. I moved her to the right and she made a saccade to keep looking at the lightbulb (completely missing me). Maybe I should paint stripes on my face.

A preference for faces

If M would rather look at a lightbulb than her dad, does she really have a preference for faces? It turns out she does. Although their face preference may not trump their love of blinds, infants still prefer to look at things that look more like faces than contrast-matched things that don't. To test this finding with M, I used an adapted version of the classic Johnson & Morton (1991) paper on newborn face preferences.

I constructed a pair of ping pong paddles, as in the picture above, with a set of three dots comprising either a pyramid (a schematic face) or an upside down pyramid (which doesn't look like a face). For each trial, I got M's attention, crouched behind her bassinet, and held a paddle about a foot away from her face. I timed how long she looked at the paddle before looking away for more than 2 seconds. (All timing was inexact because I was using an iPhone timer. Trials shorter than 5s I called a false start and restarted.)

I ran this procedure twice, when M was 13 and 15 days old. The first time I showed a non-face and then a face; the second time I showed a face and then a non-face. I tried to remain blind to condition by randomly numbering the backs of the paddles before each session, but I didn't completely succeed - so I take the results with a grain of salt. Nevertheless, M's data were striking. In the first session she looked at the non-face for 28 seconds and the face for 78 seconds; in the second she looked at the face for 46 seconds and then the non-face for 25 seconds. These results look almost identical to the visual preference reported by Farroni et al. (2005) using the same stimulus. Cool!

Interim Summary

A lot has changed over the past month. It's been remarkable to see M becoming a more awake, aware, perceptive baby. In the second post in this series, I'll talk a bit about her growing ability to use her arms, as well as the way she scans within faces.

Thursday, August 15, 2013

On publication lag

(Ulysses, tied to the mast)

One potential negative of being an academic - especially for someone as impatient as I am - is the time lag of publication. It can easily take two years from the first submission of a manuscript to when that manuscript appears in print (in some high-profile journals and faster-paced fields the whole process can be shorter, but I'll focus on psychology here). What I want to argue is that, although publication lag has a whole host of negative consequences, it can nevertheless be a pathway to better work.

As a way to shortcut the long journal submission process, I've made a habit of submitting much of my research to the Cognitive Science Society. This is a great way to get findings out quickly: short papers are due in February, reviewed by April, and presented in July. The downside is that publications are not archival. Unlike in Computer Science - where conference publications are the standard - for purposes of jobs, promotion, etc., psychology papers must be published in a peer-reviewed journal. So I often write up some more complete version of my CogSci papers as substantially longer journal articles.

I recently tried to estimate the lag in this process. In 2007, 2008, and 2009 I submitted a number of papers to CogSci:

A 2007 piece on segmentation was published with three more experiments in 2010,
A 2008 piece on numbers and verbal interference was finally published in 2012,
A 2008 piece on rule learning was published with more simulations in 2011,
A 2009 piece on the Human Speechome Project is just now getting written up,
A 2009 piece on discourse continuity evolved into a 2013 paper, and
A 2009 piece on pragmatics became both a 2012 paper and one that is now under review.

In other words, the lag was typically 3 - 5 years between the conference and the journal publication date. I feel exhausted even thinking about a lag of this magnitude: Ideas I was excited about last February will most likely see print in 2018.

By slowing down the broader dissemination of ideas, this lag clearly has negative consequences. Although conference proceedings are citable, journal papers stand a much better chance of having a long-term impact. A nicely typeset article in a good journal suggests solid research; when the article is published, there are often press-releases and content alerts that go out; and journal papers are indexed in PubMed and other catalogs as part of the archival scholarly literature. Yet that journal article is often several years out of date before we read it.

If I look at the list of articles above none of them were published, or even submitted, with only minor changes from their CogSci versions. Instead, I made substantial revisions and additions prior to submitting them for the first time. Sometimes I replaced entire experiments - in some cases all of the experiments - because I had learned how to design them better or had created a better stimulus set. I was and still am happy with the initial CogSci papers. But taking the time to write them up, get reviews, prepare a talk, and present gave me space to become dissatisfied. It gave me time to up my standards and to think that I could do better.

Peer review plays a part in this process, but the feedback I receive is not always critical in my revisions. In some case I make changes to please reviewers. But my most successful revisions are the ones in which I find a shared concern with the reviewers: a flaw that I recognize and that I am unsatisfied with. Then when I fix this flaw to my own satisfaction, reviewers are also satisfied. It takes time for me to get this kind of perspective.

That's why I think the lag itself is valuable, even independent of feedback from the review process. The slow speed of scientific publication may actually be a form of being tied to the mast. As frustrating as it is to wait 2 - 3 months for reviews, the process actually enforces the dictum of setting a draft aside, a practice that is endorsed by writing coaches from Stephen King to the Harvard Writing Center.* And without those enforced breaks, I doubt that I would have the discipline to keep from pressing "publish" and sending my (perhaps interesting but often half-baked) work out into the world.

A lot has been written about changing publication standards for psychology and for science more generally. I especially like the Scientific Utopia pieces of Nosek and colleagues that describe ways that digital communication can help with disseminating scientific knowledge. But as much as I hate to say it, I wonder whether the molasses-slow timeline of scientific publication doesn't sometimes lead to better thought-out, higher-quality papers...

---
* Of course, there are some parts of the publication process that don't provide a benefit, e.g. the lag from proofs until the journal actually decides to print the darn thing. This lag could be eliminated if we gave up on paper journals. While many journals now use e-pub before print, my experience is that this practice leads to huge messes in Google Scholar and elsewhere when the same paper is cited to two different years.

Monday, August 12, 2013

Simpson's paradox and age-of-acquisition for words

For a child, what makes a word easy to learn? One of the key things is that the word be frequent. A word that is heard frequently gives the child many more chances to remember its form and infer its meaning. This relationship between frequency and age of acquisition (AoA) is probably the most robust finding in predicting when words will be learned (although this is a dubious distinction because there aren't too many other robust predictors that have been studied across multiple contexts). But does this relationship hold up for all types of words? Or is it only (or primarily) true for nouns? In this post I'll discuss this relationship in the context of a statistical effect called Simpson's Paradox (illustrated above).

The first comprehensive study of the frequency/AoA relationship that I know of comes from a study by Huttenlocher et al. (1991). (Previous work by Brown and others had looked at these data for individual word classes). Huttenlocher and colleagues examined the correlation between the frequency of word production in mothers' speech and the age at which children produced the same word, finding a striking -.65 correlation between log frequency and AoA (both estimated from a detailed set of transcripts of parent-child talk at sessions from 16 - 26 months). Note that a negative correlation means that if you hear the word more, you learn it earlier. Even more striking, within individual caregivers, correlations were usually up in the range of .8! But in order to get this correlation, Huttenlocher et al. only looked at "content words" - excluding articles and other closed class words as well as words that were heard infrequently in their sample. More recent studies haven't found correlations nearly this high - why is that?

For example, a study by Goodman, Dale, and Li (2008) looked at correlations between corpus frequency in CHILDES and population production and comprehension norms from the MacArthur-Bates Communicative Development Inventory (CDI, a parent report measure). This approach averages across many different children, with the hope that the better measurement afforded by having more data cancels out the idiosyncratic differences you would normally find between caregiver-child dyads (e.g. some moms talk about cars, others talk about tea sets). Perhaps because of this move, Goodman et al. find fairly mixed results, with a positive relationship between the average frequency of words in different syntactic categories and their AoA (meaning more frequent words are learned later):

What's going on here? Here's one thought: Simpson's paradox is a statistical phenomenon that arises when you are interested in quantifying the relationship between two variables, and there is some grouping variable that mediates that effect. So the trend you're looking for may actually be present in each subgroup. In the illustration at the top of this post, the relationship between x and y is overall negative, but positive within each group.

In the context of word learning, the hypothesis is this: AoA is negatively predicted by frequency within syntactic categories but the relationship is weaker or even positive across categories. That's because the most frequent words, closed class words like "the" and "of," are hard to learn and tend to be dropped from children's early telegraphic speech. To examine this feature of the frequency/AoA relationship, Brandon Roy, Deb Roy, and I have been using the Human Speechome Project (HSP), an ultra-dense set of videos and transcripts of the life of one child (Deb Roy's informative TED talk here). (NB: Brandon and Deb are not related.)

In our 2009 paper, we found a similar pattern to the previous reports: the frequency with which the child in HSP heard words was a predictor of AoA, both within and (more weakly) across syntactic categories. Here's a replot of the data from that paper, so that the axes are the same as the one above from the Goodman et al. paper:

You can already get a flavor for the fact that there's a Simpson's paradox effect happening here. For example, the closed class words show the same frequency/AoA relationship (as do all groups except maybe the verbs) but the closed class words are overall much more frequent than the nouns (they are shifted right). Now look at the means alone, where we see the same positive relationship as Goodman et al. do, in contrast to the negative relationship in each of the syntactic categories alone:

The spread of AoAs is smaller than for Goodman et al., but the pattern is nearly identical. Why is the spread smaller? One simple reason is that our sample includes all kinds of idiosyncratic nouns like "cymbal" that aren't in the CDI and are lower frequency than the CDI words. So this overall moves the AoA for nouns closer to that of closed class words. (If we included words like "fluorocarbon" in the list, nouns would probably be learned on average after closed class words). The HSP child also learns words a bit faster than the standard CDI children, so all the AoAs are shifted a bit younger than in the Goodman et al. study.

To summarize: This analysis suggests that the Goodman et al. study as well as our 2009 paper both observed a pretty clear Simpson's paradox effect. The strength of the effect is modulated by which words are chosen for the analysis as well as how they are distributed across categories. So, in the initial Huttenlocher et al. study, the magnitude of the correlation was much larger than in later studies because they dropped closed class words (the magnitude was probably further inflated by the dropping of low frequency words and the use of data from a single conversation session).

The upshot of this analysis is that there is likely a very strong relationship between frequency and AoA in child language acquisition, and that effect appears to hold across word classes. The more words are heard, the earlier they are learned. Nevertheless, the precise magnitude of this correlation across the vocabulary depends strongly on the particular sample of words that are analyzed (and their syntactic category). Stay tuned for more updates from the Human Speechome Project (e.g. like this) on the microstructure of word learning.

(HT: Florian Jaeger, who used Simpson's Paradox in a recent article on linguistic typology.)

Tuesday, August 6, 2013

Fixing the axis labels

I've waited a long time for an experiment to finish and I eagerly sit down to begin my analysis. I've got a graph in mind that shows the relationship between the measure of interest and the key manipulations. I code frantically until I get the first look at the plot and then slump back onto the couch in disgust, because it doesn't look like what I imagined.

This scenario has happened to me more times than I can count, and it's one of the formative experiences of doing experimental science. If you always see the pattern of data you expect, you're doing something wrong. The question is what to do next.

When I teach data analysis, I focus on visualization and exploration rather than statistics. You have to become familiar enough with whatever graphics package you use (R, matlab, excel, etc.) that you can quickly make and alter many different visualizations of your data. But once you gain that kind of facility it becomes tempting to throw off dozens of different graphs. After seeing that first failure, the next obvious move is to try all kinds of fancy new analyses. Add a factor to the graph. Break it up by items or by subjects. Try difference scores. Make it 3D!

These are all important things to do - and even more important things to be able to do. (Well, maybe not the 3D part.) If, for example, you can't quickly do a visual subject or item analysis, then you may miss crucial issues in your design. But I also want to argue that the immediate move to more complex or more in-depth analyses may lead to post-hoc analyses, where you find results that you didn't predict. From there, it's very tempting to interpret these findings as though they are the planned analysis. Problems of this sort have been discussed intensely in recent years.

That first graph you made is important (or it should be, if you've chosen the correct starting place). That graph should be the planned analysis - the one that you wanted to do to test your initial hypothesis. So instead of poking around the dataset to see if something else turns up, what I try to do is something I call fixing the axis labels. (I use this as a shorthand for "cleaning up all the seemingly unimportant details"). Fixing the axis labels is an important way to get as much information as possible from your planned analysis.

When fixing the axis labels, take the time to walk through the graph slowly:

Label the axes appropriately, with descriptive names,
Make sure the scale make sense, adding units wherever possible,
Correct the ticks on the axes so that they are sensible in terms of placement and precision,
Fix the aspect ratio of the graph so that the measures are scaled appropriately,
Make sure there are appropriate measures of variability, ideally 95% confidence intervals so you can do inference by eye, and
Make sure that the appropriate reference lines are visible, e.g. a line indicating chance performance or a baseline from a control task.

You can see an example of this cleanup in the frontispiece image above, a simplified plot from a project I worked on a couple of years ago. Although I try to avoid bar graphs in most of my work, I've chosen one here because even in this very simple visualization it's possible to add a lot of extra important detail that helps any viewer (including me!) interpret the data.

Some or all of this may seem obvious. Yet it is astonishing to me how many students and collaborators move on from that first graph before perfecting it. (This move often stems from anxiety that my experiment has failed - prompting a search for new analyses that "worked," often exactly the wrong move.) And while sometimes fixing the axis labels simply makes a clear failure clearer, other times it can reveal important insights about the data:

Axes. Does the relationship being plotted really make sense with respect to the design of the experiment? Can you describe the axes in terms of what was manipulated (typically horizontal) and what was measured (typically vertical)? If not, then you need a different plot.
Scale and aspect ratio. Is the measurement magnitude appropriate and sensible? You can only see this if the scale is right and the aspect ratio is appropriate. But this simple check of magnitudes is an important way to catch errors. (I made this error in one of my first papers, where I plotted minima rather than means and failed to notice that infant looking times were a full order of magnitude smaller than they should have been. Not my finest hour.)
Ticks and reference marks. Does the approximate level of participants' performance make sense? How does it compare to known baselines like chance or performance in earlier studies?
Variability. Is the variability sensible? Does it vary across conditions or populations? Is the precision of the measurements sufficient to allow for the inferences you want? Often the appropriate variability measure is all you need to make a strong argument from the statistical data on a graph. The later statistical analyses may confirm and quantify the impression of reliability from the visualization, but they become much less critical.

Of course it's important to do many different exploratory visualizations. But before moving on to these secondary hypotheses, take the time to make the best representation you can of the primary, planned analysis. Make the graph as clean and clear as possible, so you can walk a friend or collaborator through it and see the nature of the measurements you've made.

(HT: lots of these recommendations are inspired by the work of Andrew Gelman and colleagues, e.g. this.)

Sunday, August 4, 2013

Crying as a feedback loop

As a new parent, I'm very interested in ways of soothing my baby daughter when she is upset. Across the board, the most popular recommendation we've had is The Happiest Baby on the Block, by Harvey Karp. HBB is a method for soothing crying infants that relies on the "5 Ss" method - swaddling the baby, giving her something to suck on, swinging her, shushing her, and putting her on her side. Karp's claim is that these five steps activate a "calming reflex" (though he is clear on the website that this is more a metaphorical description than the same kind of reflex as e.g. moro or stepping) and create a womb-like atmosphere for the child. I'm not sure about either of these claims, but on the other hand, it's pretty clear that the HBB method works well for many babies.

Seeing M cry makes me think about crying as akin to an audio feedback loop. In other words, like the ear-piercing squeal you get on an open microphone that is too close to a speaker: a single resonant frequency gradually gets picked up more by the microphone and in turn broadcast by the speaker more, leading to that tone escalating in volume until it's unbearable.

On this account, M has some overall level of arousal that she is trying to regulate (in the analogy, the sound signal being processed by the system). Some arousal is generated internally, e.g. gas, hunger, etc., while other arousal is external, e.g. seeing faces, bright lights, noise, us moving her or her limbs. Typically she damps down both internal and external inputs after they are processed in order to maintain her state. Noise makes M startle but she usually returns to sleep or equilibrium; gas makes her frown and squirm but once it passes she calms down. This is like a standard microphone system - put in sound and it gets amplified, then it dies away.

But at some points, M can't damp her inputs effectively, and you can see her face and posture change as the arousal builds up to a cry or a scream. The build up doesn't take that long, but I think every parent gets to be familiar with the signs: the wriggle, the scrunched up face, the silent mouth opening. What's interesting about this build though, is that (at least with M) it's been pretty easy to short circuit. Sometimes it's enough to change her position from lying down to sitting up; sometimes I can just say a word or two to her or bounce her a little in my arms.

If she has already started crying, though, it takes quite a bit more stimulation to get her attention. Thankfully, a change of position, eye-contact, bouncing, and a few words (maybe spoken a bit louder) will usually cut through, at least temporarily. In an audio feedback system, when you have some kind of amplification, putting noise into the system can override the feedback loop and reset the system. The more feedback there is in the system, the louder the external input needs to be.

Back to HBB. From the feedback loop perspective, 3 of the 5 Ss are ways of changing the dynamics of the baby's perceptual and input noise. Swaddling cuts down movement related stimulation; shushing is like injecting white noise directly into the auditory system; and swinging provides a lot of visual and vestibular inputs. The fourth, sucking, may be a bit different but it certainly activates a number of reflexive behaviors, which could refocus some of her internal processes. (I don't have a good account for why being in a side position matters - but this is also the one that seems to matter least for M.)

Although there's a tremendous amount of work on crying - especially focusing on how to reduce it - I haven't been able to find anything that I can really compare to this perspective. There are some hints about this general set of ideas in a classic 1962 paper by Brazelton, but the only quantitative work that I know of on this is a paper by Thomas and Martin (1976) on dynamic feedback loops between mothers and infants. Does anyone know about other relevant references on infant homeostasis, fussiness, or crying? It seems like this feedback loop framework makes many testable predictions...