Babies Learning Language

Wednesday, February 18, 2015

Stories from the mind of a toddler

Right around the holidays, M started doing something new and charming. Although she had been speaking individual words all throughout the fall, she began to use single word utterances and gestures together to tell stories about events that had happened to her. Even young infants have long-lasting memories that seem tied to individual events, but there's something different about being able to share your memories using language. For us, it meant the first opportunity to have multi-turn conversations with M. She can now share something she's thinking about and we can reminisce about it together.

The first narrative I saw was very simple. Perhaps it was less clearly evocative of a particular event than a class of events, but still different from what came before. M would say "meow" (or her equivalent, which has a lengthened second syllable only, so /aw/) and then make the gesture/sign for come here, palm upright with fingers pulling towards her. This was how she recalled our nightly walks, which tended to go to a street near our house where a friendly cat could occasionally be seen. We'd say "Are you thinking about the kitty cats? ["yah!" replies M] Did you see a kitty last night? [M - /aw/] What did you say to it? [M makes "come here" gesture] That's right - come here, kitty kitty."

So you can see how this little narrative becomes a linguistic routine, something that reinforces the memory it refers to. But this is example is also a little weak; our scaffolding of the memory was probably critical to making it more than just generalized, undirected longing for cats (something that M has quite a bit of).

The second example – the one that really convinced me – came a few weeks later when we took M to a local farm. She had been reading animal books with us for many months, so we thought she'd enjoy the experience. We were right. She was totally transfixed by watching a cow be milked and fed. After we got home, she wanted to talk about the cow a lot. Every fifteen minutes or so, out of nowhere she would moo, but we were confused by what she was trying to express. After mooing, she'd say "uh oh" and point to her tongue.

It took us several repetitions to figure out what M was telling us. Here's the story (in all its toddler glory): when the cow was eating, a lot of its feed fell out of its mouth (hence "uh oh"). But then once we figured out the story, the gesture of food coming out of the mouth got conventionalized into something like the reverse of an "eat, eat" gesture – fingers pressed together, pulling from the mouth. And then we had to do the whole routine over and over again. M: "moo" - parent: "are you thinking about the cow?" M - "uh oh!" and then [falling food gesture]. Ad nauseam. Two months later, she will still tell this story.

When M was very tiny and I had some free nap times, I read a fair amount about baby sign – we never made a decision not to teach signs to M, but life seemed too short. But the most interesting thing I learned was that there was a whole literature on children's spontaneous signs (one paper by the baby sign folks; another by an independent group). The baby sign folks in particular had documented idiosyncratic signs that got built up in structured conversations between parents and kids.

This is exactly the kind of thing M has been doing – she now has a repertoire of such signs that she uses in her narratives, including the "come here," the "falling food" gesture, as well as signs for rain, traffic, and getting splashed in the face by water. Several of them seem quite bound to particular stories, but she has also generalized. For example, her "traffic" sign is pointing and then rapidly swinging her arm back and forth. She spontaneously used this to tell a story about a horse that she had seen, who had both done some messy eating and also run around: "neigh" - [food falling sign] - [running sign]. So these signs do seem at least somewhat flexible and word-like.

Children famously suffer from "childhood amnesia" – the relative inability to recall specific episodic memories from childhood. There are many explanations for this puzzling phenomenon, but one I'm especially fond of (without too much support) is that language plays a role. Episodic memories are not always stored in language, but language provides a medium for encoding and rehearsal. Of course, if language is changing dramatically from the time of encoding to retrieval, that could cause problems for retrieval – hence possible "amnesia."

So I think M's narratives might be a first attempt to express and re-encode episodic memories. The degree to which she retains them will be variable. The relative distinctiveness of the memories will play a role, but the way she encodes them may also matter. The use of signs – which will probably vanish from her vocabulary once she can produce the appropriate words – may help in telling the stories now. But maybe – as the signs fade – she'll also forget the stories more easily?

Monday, February 9, 2015

Could conference submission be preregistration?

If we care about the answer to a particular question, preregistration – registering hypotheses and analyses ahead of time so that they are not data-dependent – is an important strategy for improving the strength of the evidence from studies bearing on that question. Of course, preregistration has some pros and cons. In my mind, the most notable these is that prereg is more appropriate for large, expensive, confirmatory studies than small, cheap, exploratory studies that are easily replicated (see my post about this topic).

In brief: My worry about pre-registered journal papers is that they can be very expensive in terms of research effort. If no one really cares about a hypothesis, then it's not a big deal not to publish on it. But if you preregister your crazy, speculative claim, then you may be stuck writing a paper telling everyone something they already expected: that your crazy idea, which would have been cool if true, is actually false. And writing papers is hard work: it takes a long time, and has severe opportunity costs. You could be doing new research during the time you are writing a careful, clear, and comprehensive paper on a thing that no one cares about because it wasn't likely to be true and indeed isn't.*

Nevertheless, there's no denying that it's good to be able to see an unbiased sample of experimental hypotheses. So here's a thought.

Something I always tell students NOT to do is to submit to conferences before they are done collecting data. This practice means that you have to impose your own biases on your preliminary data, and it can put you in an awkward position if you write a strongly hypothesis-driven abstract about data that don't end up supporting your spin on them.

But what about if we exploited this issue? We could create a track at conferences where you would submit an abstract on what you were going to do but hadn't yet done – essentially a prereg track. Then we'd have a particular poster session for seeing the results. All we'd need to do is to make sure that the conference abstracts themselves were indexed appropriately, and perhaps require an updated, post data-collection addendum. The upsides would be A) a chance for folks (especially undergrad and early grad students) to get an opportunity to present their work, and B) a low-cost preregistration mechanism.

The Cognitive Science Society already has a track for lightly-reviewed "member abstracts" – essentially posters on work that isn't done enough to merit a six-page paper. Why not "pre data-collection abstracts" too?

---
* Let me emphasize here that I'm not talking about hypotheses where a null result is important and informative, e.g. as in intervention work, or tests of theoretically-central claims. I'm talking about the kind of exploratory work – trying to play around with novel theoretical ideas – that characterizes a lot of research in cognitive science.

Tuesday, January 6, 2015

On "training" your children

tl;dr: Analysis of rhetoric from a parenting column. Why, brain, why?

A recent piece in the Washington Post's parenting advice column got me a bit bothered. The question was about how to get a two-year-old to sleep in her own bed if she doesn't want to, and the advice given, essentially, was don't. The columnist, Meghan Leary (a parenting coach) advocates letting the child get in bed with the parents, which is after all what the child wants in the first place.

Although we sleep trained with M, I don't think that the column's advice is necessarily wrong; co-sleeping is a reasonable solution, if it works for the child and the parents. Co-sleeping is the norm in many cultures, and the research suggesting that co-sleeping is hazardous is typically focused on kids who are much younger and at risk for SIDS for other reasons. Of course, probably the reason the parents are asking is because they don't like being elbowed in the face by a toddler. Regardless, what bothered me in the column wasn't the advice, per se.

Instead, it was the rhetorical discussion of why the parents shouldn't sleep train. Here's the argument:

Children are born to attach to a caregiver. They are reliant on that caregiver for years and years — far longer than the young of almost any species on Earth. (Just ask your neighbors about that basement apartment occupied by their 20-somethings.) Without a responsible caregiver, they wouldn’t last a day, let alone a lifetime. Our children need us, and their brains are wired to make sure they stay close to us.

So, when a 2-year-old has faced separation all day when she goes to day care and then experiences separation again at bedtime, her young brain goes into panic mode. And that young brain is built to take her to the parent, over and over and over.

And so when the parent places a gate at the door, her brain lights up with fear and panic, and it is experienced as a physical problem. Vomiting, breathing problems: This is a systemwide panic meltdown. It is too much for her to process “Why is Mom leaving me?!” and her body starts to compensate for what her brain cannot handle.

Note that this whole passage doesn't have any evidence in it at all, nor any talk about the behavioral history – what the child has experienced prior to the current situation – or even consequences of the child's actions. It also doesn't include any talk about the parents' quality of life. Instead, what replaces these is a set of explanations and expansions of the questioner's original statement that her child cried and threw up when she was left alone in her room at night.

Most of these explanations aren't necessarily wrong – in current form they're really too vague to be judged on the basis of scientific evidence – but they do a lot of negative rhetorical work by invoking -isms that keep people from thinking sensibly about parenting.

Nativism. Nativism in cognitive development is an interesting and important theoretical position, and it has its place. But all of this talk of being "wired" for X or "born to" Y here is just a way of stopping argument about whether a particular behavior is something we want to encourage. Many people would argue that we are born to discriminate against members of other groups, but we should still teach our children values of tolerance and openness. Maybe kids are "born to sleep next to their caregivers" but if that means the caregivers can't get a good night of rest, then perhaps teaching (even "training") can be a reasonable option to consider.

Dualism. When her "body starts to compensate for what her brain cannot handle" this situation sounds pretty dire. But what does that mean? Brain and body (actually, mind and body) are connected. Psychological stress can lead to health problems, etc. etc. So what do we get from separating mind and body here other than an implication that sleep-training related separation anxiety can lead to bad health outcomes, a claim that is not supported by any evidence?

Brain-o-centrism. Why is it the toddler's brain that goes into overload? And why is it her brain lighting up with fear? Again, this seems more like a rhetorical strategy than any kind of evidence. Empty brain statements like these serve as proxies for explanation without any explanatory content.

Anti-daycare-ism. Finally, why does day care have anything to do with this? This little thrown-off clause ("has faced separation all day when she goes to day care and then") really annoyed me – anti-daycare bias is deeply discriminatory against working parents (e.g. like M's mom and me). I know of no evidence that sleep training or sleep problems more generally interact with whether a child is in daycare. There is an interesting and complex body of research on the behavioral consequences of day care, but that's not what is being referenced here. So I find the offhand implication here that day care separation anxiety can cause sleep disturbance to be deeply problematic.

Why am I picking on this one column, whose advice I don't even necessarily dispute? All throughout parenthood I've been confronted with cases where people have very strong opinions about parenting that seem as though they are grounded in the research that I supposedly pursue for a living (trying to understand children's cognitive development). And yet when I look into them more deeply, they make no sense at all. This column, posted on facebook by an acquaintance, was the last straw.

That's it for now. I'm off to pick up M at daycare.

Friday, December 19, 2014

Why can't toddlers play with one another? An alternative account of parallel play

Whenever I go to daycare, or interact with other parents of toddlers, I hear about how M and other kids her age – 17 months now – are engaged in parallel play. The basic idea is that, even though young toddlers like to be near other kids their age, they don't play together: they engage in the same sorts of activities in close proximity, but without any sort of reciprocal interaction. I'll argue here that this label is at best a descriptive convenience – it doesn't reflect any inability to engage in reciprocal play – and masks an interesting developmental story.

The idea of parallel play idea dates back to Parten (1932), who noted the prevalence of this kind of behavior in young preschoolers. For fun, here's the key figure from her study:

The data are pretty clear – and the graph surprisingly modern! In fact, you can see this sort of thing happening in any daycare classroom, and even more so for 1 - 2 year-olds than the preschoolers in Parten's study. But the question is what to make of this descriptive observation (Parten herself doesn't give much of any interpretation, at least in that paper).

So we turn to the internet. Of course, whattoexpect.com has an interpretation of why parallel play occurs:

[Parallel play is] par for the developmental course for babies and toddlers. Why? Because a child this age is still busy figuring out so much about the world and doesn't yet realize that people his own size are indeed people (who might actually be fun to do stuff with). He's too young to make friends, but companionable side-by-side play is a good start.

You hear this echoed across many other sources of information for parents, including the teachers at M's daycare. These sorts of stage labels are endemic in developmental psych of the popular variety, and they often imply that there is a cognitive change that accompanies the behavioral stage shift. I think this developmental story is deeply wrong.

Over the last 15 - 20 years, a large body of evidence has accumulated that suggests that young children have very robust expectations for the social world by their second year. Babies can build social expectations for almost anything – even for eyeless blobs – so they definitely should have such expectations for other toddlers. Other work suggests that very small cues like reaching, looking, and movement towards a target can effectively cue inferences about an agent's goals and desires. So toddlers almost certainly understand that their peers have goals and desires, perhaps desires that even differ from the toddler's own. In addition, toddlers have no trouble engaging in reciprocal interactions with older children and adults (e.g., giving games, simple games of catch).

In fact, in a recent paper by Cortes and Dweck, having adults engage in parallel play – rather than reciprocal play – with toddlers made them less likely to help that adult achieve a goal later on. So that's a nice piece of evidence for two things. First, parallel play is far from being the only way that toddlers can interact. Second, they actually think it's negative in some way when an adult doesn't play with them reciprocally, so they are forming strong expectations both about and from the type of play they engage in with different partners.

Why do toddlers exhibit so little parallel play, then? I think what's going wrong is that the appropriate social cognitive abilities are very much present in kids of this age, but they are hard to exercise, and critically, social computations are slow. Reciprocal interaction with a peer requires fast online recognition of goals and action planning with respect to those goals. You need to know what your play partner wants you to do, and you need to figure that out before she loses interest and gets distracted. That's pretty easy for adults to do; they create structured play opportunities for toddlers all the time. (For example, last night I set up a tea party for M and helped her serve tea to a wide variety of different stuffed animals).

But when you get two toddlers together, they strike out so often that it might be adaptive to avoid trying to engage! In a recent episode I watched, M saw that another little girl Y wanted a toy car. But by the time she figured out that Y wanted the car, Y had already moved on to other things. The result was that M walked up to Y at a totally inappropriate time and thrust a car in her face for seemingly no reason. Nice idea, but poor execution. Maybe if you are a toddler, you learn not to try out this kind of gambit until you're more confident you will succeed.

This explanation – that parallel play is an adaptive consequence of toddlers' poor speed of processing – is a product of something that I've been exploring a lot on this blog: that babies and toddlers are surprisingly knowledgeable about the world, but their ability to use this knowledge is sharply limited. The limitation here is that social computations are very slow, so that by the time the computations are done, their output is less likely to be relevant. In other words, "parallel play" as a description is correct, but the shift to a more reciprocal style of play may not have anything to do with a cognitive shift. Instead it may emerge from more gradual changes in children's speed of social processing.

Cortes Barragan R, & Dweck CS (2014). Rethinking natural altruism: Simple reciprocal interactions trigger children's benevolence. Proceedings of the National Academy of Sciences of the United States of America, 111 (48), 17071-4 PMID: 25404334

Monday, November 24, 2014

The piecemeal emergence of language

It's been a while since I last wrote about M. She's now 16 months, and it's remarkable to see the trajectory of her early language. On the one hand, she still produces relatively few distinct words that I can recognize; on the other, her vocabulary in comprehension is quite large and she clearly understands a number of different speech acts (declaratives, imperatives, questions) and their corresponding constructions.

Some observations on production:

She still doesn't say "mama." She does say "mamamamamama" to express need, a pattern that Clark 1973 noted is common. She definitely knows what "mama" means, and even does funny things like pointing to me and saying "dada" then pointing to her mother and opening her mouth.
I have nevertheless heard her make un-cued productions of "scissors," "bulldozer," and "motorcycle" (though not with great reliability). Motorcycle translated to something like "dodo SY-ku" – a kind of indistinct prosodic foot and then a second heavily stressed foot. Her production vocabulary is extremely idiosyncratic compared with her comprehension, precisely the pattern identified by Mayor & Plunkett (2014) in a very cool recent paper.
"BA ba" (repeated over and over again) seems to mean "let's sing a song" – or especially, let's watch inane internet children's song videos. We don't do this last all that often, but it has made an outsize impression on her, perhaps because she's seen so little TV in her short life. This is also the first time that she's taken to repeating a single word / label over and over again, so as to emphasize the point.

And on comprehension:

Our life got vastly better when M learned how to say "yes" to yes/no questions. For about a month now, we've been able to say things like "would you like to go outside?" and she will reply "da!" (she is Russian, apparently). "Da" has very recently morphed into "yah" but it's very clearly a strong affirmative. M will occasionally turn her head away and wrinkle her nose if she doesn't like the suggestion. This response feels a lot like a generalization of her I don't want to eat that bite face.
Other types of questions have been slower. Maybe unsurprisingly, "or" is still not a success – she either stays silent or responds to the second option, even if she knows how to produce a word for one or both options. "Where" questions have been emerging in the last week or so. This morning, M was very clear in directing me when I asked her "where should we go?" "What's this" is uneven – occasionally I'll get a "ba" or "da" (ball/dog) type production. And "what do you want" has only gotten a successful production once or twice (bottle, I think).
M understands and responds to simple imperatives just fine: "take the cup to baby" gets a positive response, though her accuracy on less plausible sentences is low.
Explanations seem to hold a lot of water with her. I don't think she understands the explanation at all, but if we need to give something to someone, or leave something behind that she's holding, we ask her and then explain. For example, telling her why we can't bring her favorite highlighter pen in the car with us seems to convince her to put it down. What's going through her mind here? Maybe just our seriousness about the idea – something like wow, they used a lot of words, they must really mean it?
She is remarkably good at negation (at least when she wants to be). A few days ago we were headed out the door to the playground, and M tried to drag a big stroller blanket out the door. I said "We're not going to bring our blanket outside." She headed back over to the stroller, and dropped the blanket. Of course, then she headed back towards the door, turned back, and grabbed a smaller blanket. There was a lot of contextual support to this sequence, but understanding my sentence still took some substantial sophistication. The negation "we're not" is embedded in the sentence, and wasn't supported by too much in the way of prosody. This success was very striking to me, given the failures of much older toddlers to understand more decontextualized negations in some research that Ann Nordmeyer and I have been doing.

Overall, I am still struck by how hard production is for M, compared with comprehension. A new word, say "playground" might start as something resembling "PAI-go" but merge back into "BA-ba" by the end of a few repetitions. M has never been a big babbler, and so I suspect that she is slow to produce language because the skills of production are simply not as well-practiced. There are some kids who babble up a storm, and I imagine all of the motor routines are much easier for them In contrast, M just doesn't have the sounds of language in her mouth yet.

Wednesday, November 19, 2014

Musings on the "file drawer" effect

tl;dr: Even if you love science, you don't have to publish every experiment you conduct.

I was talking with a collaborator a few days ago and discussing which of a series of experiments we should include in our writeup. In the course of this conversation, he expressed uncertainty about whether we were courting ethical violation by choosing to exclude from a potential publication a set of generally consistent but somewhat more poorly executed studies. Publication bias is a major problem in the social sciences (and elsewhere).* Could we be contributing to the so-called "file drawer problem," in which meta-analytic estimates of effects are inflated by the failure to publish negative findings?

I'm pretty sure the answer is "no."

Some time during my first year of graduate school, I had run a few studies that produced positive findings (e.g., statistically significant differences between groups). I went to my advisor and started saying all kinds of big things about how I would publish them and they'd be in this paper and that paper and the other; probably it came off as quite grandiose. After listening for a while, he said, "we don't publish every study we run."

His point was that a publishable study – or set of studies – is not one that produces a "significant" result. A publishable study is one that advances our knowledge, whether the result is statistically significant or not. If a study is uninteresting, it may not be worth publishing. Of course, the devil is in the details of what "worth publishing" means, so I've been thinking about how you might assess this. Here's my proposal:

It is unethical to avoid publishing a result if a knowledgeable and adversarial reviewer could make a reasonable case that your publication decision was due to a theoretical commitment to one outcome over another.

I'll walk through both sides of this proposal below. If you have feedback, or counterexamples, I'd be eager to hear them.

When it's fine not to publish. First, everyone doesn't have an obligation to publish scientific research. For example, I've supervised some undergraduate honors theses that were quite good, but the students weren't interested in a career in science. I regret that they didn't do the work to write up their data for publication, but I don't think they were being unethical, at least from the perspective of publication bias (if they had discovered a lifesaving drug, the analysis might be different).

Second, publication has a cost. The cost is mostly in terms of time, but time is translatable directly into money (whether from salary or from research opportunity cost). Under the current publication system, publishing a peer-reviewed paper is extremely slow. In addition to the authors' writing time, a paper takes hours of time from editors and reviewers, and much thought and effort in responding to reviews. A discussion of the merits of peer review is a topic for another post (spoiler: I'm in favor of it).** But even the most radical alternatives – think generalized arXiv – do not eliminate the cost of writing a clear, useful manuscript.

So on a cost-benefit analysis, there is a lot of work that shouldn't be written up. For example, cases of experimenter error are pretty clear cut. If I screw up my stimuli and Group A's treatment was contaminated with items that Group B should have seen, then what do we learn? The generalizable knowledge from that kind of experiment is pretty thin. It seems uncontroversial that this sort of results aren't worth publishing.

What about correct but boring experiments? What if I show that the Stroop effect is unaffected by font choice – or perhaps I show a tiny, statistically significant but not meaningful, effect of serif fonts on Stroop effect.*** For either of these experiments, I imagine I could find someone to publish them. In principle, if they were well-executed, PLoS ONE would be a viable venue, since they do not referee for impact. But I am not sure why anyone would be particularly interested, and I don't think it'd be unethical not to publish them.

When it's NOT fine not to publish. First, when a finding is "null" – meaning, not statistically significant despite your expectation that it would be. Someone who held an alternative position (e.g. that the finding would not be predicted to yield a significant result) could say that you were biasing the literature due to your theoretical commitment. This is probably the most common case of publication bias.

Second, if your finding is inconsistent with a particular theory, this fact also should not be used in the decision about publication. Obviously, an adversarial critic could argue – rightly – that you suppressed the finding, which in turn leads to an exaggeration in the degree of published evidence for your preferred theory.

Third, when a finding (finding #1) is contradictory to another finding (finding #2) that you do intend to publish. Here, just think about if your reviewer knew about #1 as well. Could you justify on independent, a priori grounds that you should not publish #1, independent of the theory? In my experience, the only time that is possible is if #1 is clearly a flawed experiment and does not have any evidential value for the question you're interested in.****

Conclusions. Publication bias is a significant issue, and we need use a variety of tools to combat it. Funnel plots are a useful tool, and some new work by Simonsohn et al. uses p-curve analysis. But the solution is certainly not to assume that researchers should publish all their experiments – that solution might be as bad as the problem, in terms of the cost for scientific productivity. Instead, to determine if they are suppressing evidence due to their own biases, researchers should consider applying an ethical test like the one I proposed above.

----
(The footnotes here got a little out of control).

* A recent, high impact study used TESS (Time-Sharing Experiments in the Social Sciences, a resource for doing pre-peer reviewed experiments with large, representative samples) to estimate publication bias in the social sciences. I like this study a lot, but I am not sure how general the bias estimates are, because TESS is a special case. TESS is a limited resource, and experiments submitted to TESS undergo substantial additional scrutiny due to TESS's pre-data collection review. They are relatively more well-vetted for potential theoretical impact, and substantially less likely to have basic errors, compared with a one-off study using a convenience sample. I suspect – based on no data except my own experience – that relatively more data is left unpublished than the TESS study's estimate, but also that relatively less of it should be published.

** You could always say, hey, we should just put all our data online. We actually do something sort of like that. But you can't just go to github.com/langcog and easily find out whether we conducted an experiment on your theoretical topic of choice. Reporting experiments is not just about putting the data out there – you need description, links to relevant literature, etc.

*** Actually, someone has done Stroop for fonts, though that's a different and slightly more interesting experiment.

**** Here's a trickier one. If a finding is consistent with a theory, could this consistency be grounds to avoid publishing it? A Popperian falsificationist scientist should never publish data that are simply consistent with a particular theory, because those data have no value. But basically no one operates in this way – we all routinely make predictions from theory and are excited when they are satisfied. For a Bayesian scientist of this type, data consistent with a theory are important. But some data may be consistent with many theories and hence provide little evidential value. Other data may be consistent with a theory, but that theory is already so well-supported, so the experiments make little change in our overall degree of belief – consider the case of experiments supportive of Newton's laws, or of further Stroop replications. These cases also potentially work under the adversarial reviewer test, but only if we include the cost-benefit analysis above, and the logic is dicier. A reviewer could accuse you of bias against the Stroop effect, but you might respond that you just didn't think the incremental evidence was worth the effort. Nevertheless, this balance seems less straightforward. Reflecting this complexity, perhaps the failure to publish confirmatory evidence actually does matter. In a talk I heard last spring, John Ioannidis made the point that there are basically no medical interventions out there with d (standardized effect size) > 3 or so (I forget the exact number). I think this is actually a case of publication bias against confirmation of obvious effects. For example, I can't find a clinical trial of the rabies vaccine anywhere after Pasteur – because the mortality rate without the vaccine is apparently around 99%, and with the vaccine most people survive. The effect size there is just enormous – so big that you should just treat people! So actually the literature does have systematic bias against really big effects.

Monday, November 10, 2014

Comments on "reproducibility in developmental science"

A new article by Duncan et al. in the journal Developmental Psychology highlights best practices for reproducibility in developmental research. From the abstract:

Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key results are robust across estimation methods, data sets, and demographic subgroups. This article makes the case for prioritizing both explicit replications and, especially, within-study robustness checks in developmental psychology.

I'm very interested in this topic in general and think that the broader message is on target. Nevertheless, I was surprised by the specific emphasis in this article on what they call "robustness checking" practices. In particular, all three of the robustness practices they describe – multiple estimation techniques, multiple datasets, and subgroup analyses – seem to be most useful for non-experimental studies that involve large correlational datasets (e.g. from nationally representative studies).

Multiple estimation techniques refers to the use of several different statistical models (e.g. standard regression, propensity matching, instrumental variable regression) to estimate the same effect. This is not a bad practice, but it is much more important when there are many different ways of controlling for confounders (e.g. in a large observational dataset). In a two-condition experiment, the menu of options is more limited. Similarly, subgroup estimation – estimating models on smaller populations within the main sample – is typically only possible with a large, multi-site dataset. And the use of multiple datasets presupposes that there are many datasets that bear on the question of interest, something that is not usually true when you are making experimental tests of a new theoretical question.

So all this means that the primary empirical claim of the article – that developmental psych is behind other disciplines (like applied economics) in these practices – is a bit unfair. Here's the key table from the article:

The main point we're supposed to take away from this table is that the econ articles are doing many more robustness checks than the developmental psych articles. But I'd bet that most of the developmental psych journals are filled with novel empirical studies that don't afford comparison with large, pre-existing datasets; subgroup analyses; or use of multiple estimation techniques. And I'm not sure that's a bad thing – at very least, causal inference is far more straightforward in randomized experiments than large-scale observational studies.

I think I have the same goals as the authors: making developmental (and other) research more reproducible. But I would start with a different set of recommendations to the developmental psych community. Here are three simple ones:

Larger samples. It is still common in the literature on infancy and early childhood to have extremely small sample sizes. N=16 is still the accepted standard in infancy research, believe it or not. Given the evidence that looking time is a quantitative variable (e.g. here and here), we need to start measuring it with precision. Infants are expensive, but not as expensive as false positives. And preschoolers are cheap, so there's really no excuse for tiny cell sizes.
Internal replication. There are many papers – again especially in infant research but also in work with older children – where the primary effect is demonstrated in Study 1 and then the rest of the reported findings are negative controls. A good practice for these studies is to pair each control with a de novo replication. This facilitates statistical comparison (e.g., equating for small aspects of population or testing setup that may change between studies) and also ensures robustness of the effect.
Developmental comparison. This recommendation probably should go without saying. For developmental research – that is, work that tries to understand mechanisms of growth and change – it's critical to provide developmental comparisons and not just sample a single convenient age group. Developmental comparison groups also provide an important opportunity for internal replication. If 3-year-olds are above chance on your task and 4- and 5-year-olds aren't, then perhaps you've discovered an amazing phenomenon; but it's also possible you have a false positive. Our baseline hypotheses about development provide useful constraints on the pattern of results we expect, meaning that developmental comparison groups can provide both new data and a useful sanity check.

Perhaps this all just reflects my different orientation towards the field than Duncan et al.; but a quick flip through a recent issue of Child Development suggests that the modal article is not a large observational study but a much smaller-scale set of experiments. The recommendations Duncan et al. make are certainly reasonable, but we need to supplement them with guidelines for experimental research as well.

Duncan GJ, Engel M, Claessens A, & Dowsett CJ (2014). Replication and robustness in developmental research. Developmental psychology, 50 (11), 2417-25 PMID: 25243330

(HT: Dan Yurovsky)