Babies Learning Language: 2013

Thursday, December 12, 2013

A belated git migration

It's coming up on conference paper season, specifically for the Cognitive Science Conference. I love how deadlines like CogSci push research forward, giving us an intermediate goal to shoot for. But when lots of folks in the lab are writing papers separately, keeping track of all the drafts can get unwieldy very fast. My resolution this year is that no one will send me any more zip files of a directory called "CogSciPaperFinal." File naming practices like this one have been caricatured before,* but they get even worse when I'm constantly trying to track something like 6 - 8 different papers going forward at the same time.

Towards that end, our last lab meeting of the quarter was on version control software. In a nutshell, version control packages allow individuals and collaborative groups to work together on a project (usually a software codebase) and provide tools for keeping track of and merging changes to the project. It's painfully clear that we're late to the party: virtually no one in industry works on a large project without version control, but, as is frequently noted, scientists are not good software engineers.

We are starting a lab-wide push to keeping track of all of our writing and code using git and github. This transition will mean a bit of discomfort – hopefully not pain – but it's a far better method for storing our work and sharing it with collaborators. If you haven't played with git, I recommend looking at this nice tutorial by NYU's John McDonnell. I also found it very useful to do the TryGit tutorial. The lab's (currently very empty) github page is here. Hopefully in a couple of months it'll be substantially fuller...

---
* HT: Michael Waskom.

Friday, December 6, 2013

Computation under interference

(ENIAC, the first electronic general purpose computer; courtesy Wikipedia).

What if you have a very powerful computer, but it only works some of the time? Maybe it's made from vacuum tubes, and when they overheat or when some dust or a fly ends up in the works, one of those tubes burns out. Then the computer is down for weeks on end. But even if it's nonfunctional most of the time, when it's working, it's turing complete.

I'm coming to think that babies are this sort of computer. Perhaps the biggest puzzle in cognitive development is the amazing things that babies can sometimes do in very controlled settings (think moral reasoning) and yet the tremendous amount they can't do the rest of the time (think anything except eat, sleep, poop, suck, swipe and occasionally give you a charming smile...). I wonder if one way to reconcile these two different conceptions of infants is by thinking about the challenges of regulating their arousal – in much the same way you need to regulate the temperature of the vacuum tubes to get optimal computing performance.

Sometimes M gives me an incredibly intelligent look and does something unexpected. In the past weeks, she's been trying to pick up her chair to see the bottom, cooing systematically in response to me, or pulling out and reinserting her pacifier in her mouth. But other times she is glassy-eyed because she's concentrating on eating, or wiggling because she has indigestion. Tiredness is the biggest cause of cognitive failures. When I'm tired, I get grouchy, and my reactions are slower. When M is tired, everything goes to pieces. For a while she would even forget how to swallow: Milk would come pouring out of her mouth because she was sucking it in but forgetting to put it away down her throat...

When I was starting to think about M's cognitive development, right after she was born, I described her crying as a feedback look, where arousal leads to more and more arousal unless there is some internal regulation or external noise in the system. Having observed her for a few more months, I'm increasingly convinced that crying is only one small part of this process.

In fact, most of M's psychological world – perhaps ours as well, though it's well-hidden – seems like it's about regulation of attention (think temperature in the vacuum tube room). Part of this is learning to attend to what is interesting in the world (say, her father's face rather than the blinds). Another partis learning to suppress attention to all kinds of stimuli, including both visual stimuli and internal sensations (like gurgling in the stomach or wetness in the diaper). When she gets tired this all stops happening. Internal sensations get amplified, external ones don't get attended to. The vacuum tubes start burning out, and only a long, relaxing nap will help.

Tuesday, November 26, 2013

Confounds in developmental time

(Looking for developmental dissociations between processes can be a profitable research strategy, but such dissociations may be affected by external events like the transition to formal schooling.)

As a developmental psychologist, I'm primarily interested in answering "how" questions: How do children figure out how objects work, learn the meanings of words, or recognize the beliefs or goals of others? Yet along the way, I can't help interacting with the (less interesting) more descriptive set of "when" questions: When do children show evidence of object permanence, learn their first word, or pass false belief tasks? And in studying any individual phenomenon, answers to "how" questions can be informed by estimates of when a particular behavior is first observed.

But here's an issue that has been bothering me for a while. Our "when" estimates – derived as they are from the behavior of middle-class kids in the US and Europe – are not independent from one another. They are instead highly correlated, because of external milestones in the lives of the children we are studying. Transitions to preschool or to kindergarten are major drivers of new behaviors. Worse, because teachers are active readers of developmental psychology, new school experiences likely involve explicit practice of exactly the kinds of skills we're interested in studying.

One possible example of this issue comes from a lovely talk I heard by Yuko Munakata at the Cognitive Development Society meetintg. Munakata has a deep body of recent work on the development of children's executive function (roughly, the ability to shift flexibly between different sets of behaviors according to context or task; review here). She documents transitions in children's executive function, including the transition from reacting to a stimulus to proactive preparation – choosing the proper behavior for a particular situation ahead of time. To be clear, nothing in Munakata's work depends on the precise timing of these transitions. Yet suspiciously, many of the transitions she studies happen in the same age range (4 - 6 years) when children are transitioning to school, an environment where their executive functions are being challenged and perhaps even trained.

A second example (very far from my area of expertise) comes from a comment made by Kate McLean in a recent brownbag talk she gave at Stanford. McLean studies identity development in adolescents, and she noted a big uptick in the quality of narratives in later high school. When she probed more deeply, however, she uncovered an external driver: late high schoolers were all engaged in the same social ritual: college application essays.

The research in these examples is not necessarily compromised by the presence of external events. But nevertheless, these kinds of events are big factors that might affect study outcomes in ways we wouldn't otherwise predict. From my perspective, I wonder how much the cognitive constructs I am interested in – pragmatics, language learning, theory of mind reasoning – are affected by individual children's transition to preschool, since the period around age 3 - 4 is a time of tremendous development for all of these abilities.

Studies that dissociate age and school shouldn't be too complicated to do, for either executive control or for other constructs. And these sorts of studies might give us some insights into the ways that (pre-) school experiences support the development and refinement of cognition. I recently heard the term "academic redshirting": holding children back from starting school so that they are older and do better than their peers when they finally start. This is a fairly intense (and controversial) strategy for getting kids ahead, but it might create an interesting natural opportunity for studying cognitive development...

Tuesday, November 12, 2013

What can a four-month-old do?

If you read papers on babies, you get the sense that mostly they just stare at stuff. The vast majority of research on babies uses visual attention – usually time looking at a screen or puppet show – as its dependent measure. Some experiments use more exotic dependent variables, like operant conditioning of kicks, pacifier sucking rate, or even smiles. But since Fantz's work in the 1960s, habituation and related looking time paradigms have dominated the field. Although we're reminded occasionally that babies cry, fuss, poop, and sleep, developmentalists appear far and away most interested in looking (very nice review and critique of this idea by Dick Aslin here).

As a reader of that literature, it's been consistently amazing to me to see what M can do, even as a little baby. She is about to turn four months old next week, and the the range of her behaviors is astonishing. Even more interesting is that some of this behavioral repertoire gives clear signals to underlying cognitive processes. Here's a quick list of some things I've noticed:

Eating. M takes a lot of her meals from a bottle. Early on, she showed no recognition of the bottle itself until it touched her cheek or lip, activating the rooting reflex. But around a month or six weeks ago ago she started showing signs of recognizing the bottle as an object, and responding to it before it reached her mouth. At first, the evidence seemed inconclusive to me – she was reaching (at that point mostly unsuccessfully) for many objects, so a reach for the bottle didn't seem diagnostic. But now she shows clear signs of recognition: When she is hungry and sees the bottle, she vocalizes and opens her mouth. Although I haven't tested this systematically, her recognition seems fairly viewpoint-invariant: she can recognize the bottle in many different orientations. This provides converging evidence for object categorization in 3 - 4 month-olds. It also seems like it could be a neat method for studying vision – think specially engineered bottles with different shapes and colors...

Smiling. Ever since around six weeks, M has been a very smiley baby. She greets people with a big smile, sometimes even smiling when she is otherwise quite fussy. It's kind of fun (in a slightly sad way) to watch smiles war with crying. If she is starting to fuss you can smile at her and see a reciprocal smile fight its way through her pouty face. But so far I have seen no evidence that her smiles reflect recognition of me or her mom: she gives them quite indiscriminately right now. (I know there is other evidence for recognizing and preferring mom, via her face or even her smell, very early on; I just find it surprising that she smiles roughly as much for others as she does for us).

Also, even if I hadn't tested M's face preference to schematic stimuli early on, her smiles would be a good indicator of her recognition of pictures. M will give a big smile at a picture of a baby's face. (Before I saw this, I never understood why people gave us board books filled with baby faces.) It doesn't seem surprising now that babies recognize pictures, but people used to argue that there were "primitive peoples" (presumably tribes somewhere) who didn't recognize photos. Hence picture perception – the ability to recognize the content of pictures – would be a learned cultural skill, and so babies wouldn't recognize pictures. A beautiful study by Hochbert and Brooks (1962), in which they denied their own child access to pictures and then tested his recognition, nicely disproved this idea.

Rolling over. M has rolled over a few times, from prone (tummy) position to her back. Each time, she was interested in an object on one side of her, and she turned her head and body that way (rotating herself onto her side), then began to kick her legs. When she kicked especially hard, her center of gravity tipped over her midpoint, and she flopped onto her back. This was clearly not something she was expecting, viz. her look of complete and total surprise. The interesting thing is that she hasn't been able to reproduce this behavior in a week. This kind of motor exploration really looks like reinforcement learning, where the issue is assigning credit for the result: which of many different behaviors produced the rewarding outcome?

Vocalizations. M started cooing right around when she started smiling – a very adorable behavior. Now her vocalizations have differentiated a bit more: coos when she is in a good mood, squeals when she is starting to fuss. But the most interesting noise she makes is something we call her "cognition noise." There are several physiological measures of attention and cognition in infants, for example heart rate and pupil dilation. M presumably shows these, though we haven't measured. What we didn't bargain for is that she actually shows changes in respiration and vocalization when she is concentrating. When we give her a new toy, she stares at it, grunts, and breathes heavily. It's almost like the fan coming on for a MacBook Pro when the CPU is working hard. Adorable – and a nice external measure of attention.

Tuesday, October 22, 2013

How to make a babycam

(This is a guest post, written by Ally Kraus.)

After the recent Cognitive Development Society meeting, several people asked how we construct our head-mounted cameras (headcams or babycams for short), seen e.g. in this paper. Here are the ingredients - the camera, the mount, and the band - and how we put them together.

The Camera

Our recent headcams have used three types of camera. Each has pros and cons. We started out with the MD-80 camera, and then moved on to Veho cameras because they have a larger field of view and better image quality.

MD-80: You can find these cameras (and their knockoffs) on Amazon and EBay. The MD-80s are cheap and very lightweight, and also come with an accessory pack with a variety of mounts.

Veho Pro: The Pro is a more heavy-duty version of the MD-80. It has much clearer indicator lights, nearly double the battery life, and the camera also has a larger field of view. We have had some problems with the audio in the video files (either with it being quite noisy, or not synching with the audio) and also file corruption; different instances of the camera have had different issues. Also, the Pro does not come with the mount we need to attach it to the headband, so we have cannibalized the MD-80 mounts we had used previously. Amazon link for the camera here.

Veho Atom: Very similar to the Pro (same pros/cons), the Atom is smaller, and has about half the battery life. It does come with a headband mount. On Amazon here.

Fisheye Lens: The only modification we’ve made to the cameras themselves is to attach a fisheye lens to widen the field of view. We’ve used a simple magnetic smartphone version, like this one. The lens comes with a ring you can attach to a surface for the lens to adhere to. We attached ours with a ton of hot glue. (We’ve also substituted regular washers from the hardware for the metal ring that’s included.) The lens can be knocked off by kids, so you can also glue the fisheye lens itself to the ring so it’s permanently on the camera.

Here is a comparison of the MD-80 and the Veho Pro, with and without the fisheye:

You can see that the field of view is dramatically different. The MD-80 without fisheye has a vertical field of view of about 22 degrees, while with fisheye it has a bit more than 40 degrees. The Veho is almost that good - around 40 even without the fisheye. It goes up to about 70 with the fisheye. The lenses on these cameras are not completely consistent, though, so we have found variance in our view measurements from camera to camera.

The Mount

Ideally, the camera lens would be situated in the center of the child's forehead just at the brow line, to give a semi-accurate idea of what the child can see. We wanted to have some ability to make adjustments; in particular to angle the camera down if it were positioned too high, though, since some children find the camera distracting if it's too low on the forehead.

Both the MD-80 and the Atom come with an angle-adjustable mount that pivots at one end. It's not ideal for our purposes because the lens is on the opposite side from the pivot point (indicated by a circle). All my diagrams use the MD-80 mount, although the Atom’s is similar, just smaller:

We really want the lens end to be right above the pivot so it's low on the child's forehead. We remedied this by unscrewing the two screws, flipping the camera holder upside-down, and re-assembling it. It's doesn't fit quite as well this way (note the slight gap) but it's fine and not going to budge:

You can buy a similar mount for the Pro in a separate accessory package – unfortunately it is not included with that camera. We ended up not buying the accessory kit, but simply modded some of our existing MD-80 mounts to fit the Pro.

The Band

We modified some Coast LED Lenser 7041 6 Chip LED headlamp bands for our headcamera. The best thing about this headlamp is that it comes with some plastic hooks that fit the mount perfectly. We disassembled the headlamp, keeping only the band, the top strap, and two of the hooks. The band is designed for adults and ended up being too large for some children; we fixed this by pulling apart the seam, trimming the elastic a few inches, and re-sewing it. The top strap was also too small with the battery pack removed, so we kept the buckle and replaced the adjustable part of the strap with a longer piece of 1" Nylon Woven Elastic purchased from http://www.rockywoods.com/.

The hooks connect the mount to the band. Slip the hooks into the bottom row of rectangular holes on the headcam mount and snap them into place:

It helps to hot glue the mount to the hook pieces, in order to stabilize the connection. You can then slip it on to the headband:

Our headcams have a headstrap to keep the camera snug on the child's head and also to prevent it from sliding/being pulled down. We wanted to ensure that the back one especially would be comfortable against the child's head.

For the front, we used a pipe cleaner. (Easy to bend, and relatively soft/safe around children.) We threaded the pipe-cleaner through the loop on the top strap (1). Then we threaded the ends of the pipe-cleaner from the back to the front through the top rectangular holes, then down along the sides of the camera (2). We twisted them together at the bottom of the camera mount (3), and then threaded the ends back into the hinge so there is no danger of them poking the child:

For the back, we picked the seam on the back loop of the top strap, wrapped the end around the band, and sewed it in place:

Finally, we added a little padding to the inside-front of the strap so that the plastic hook pieces wouldn't rest against the child's forehead. You can use the extra elastic from when you shortened the band, and hot glue it to the plastic hook pieces:

Voila! The final headcam is as pictured at the top of the post. Please let us know if you find this useful or if you discover other good variants on our setup.

Thursday, October 10, 2013

Randomization on mechanical turk

Amazon Mechanical Turk is a fabulous way to do online psychology experiments. There have a bunch of good tutorial papers showing why (e.g. here, here, and here). One issue that comes up frequently, though, is how to do random assignment to condition. Turk is all about allowing workers to do many HIT (human intelligence task, Turk's name for a work assignment) types, one after another. In contrast, most experimental psychologists want to make each condition of their experiment a single HIT and to get participants to do only one condition.

If you are using the web interface to Turk, you are creating a single HTML template, populated with different values for each distinct HIT type. That means that each different condition is a different HIT. In this case, if you want random assignment to (a single) condition, all you can do is write prominently "please do only one of these HITs." The problem is that Amazon displays HITs from the same job one after another, so you have to make sure that every worker stops after doing just one. This strategy generally works until some worker does 7 or 30 conditions of your experiment - messing up your randomization and putting you in the awkward position of paying for data you (typically) can't use. Nevertheless, I and many other people used the "do this HIT once" method for years - it's easy and doesn't go wrong too much if the instructions are clear enough.

In the last couple of years, though, folks in my lab have moved to using "external HITs" where we use Turk's Command Line Tools to direct workers to a single HTML/JavaScript-based HIT that can do all kinds of more interesting stuff, including having multiple screens, lots of embedded media, and a more complex control flow. The HTML/js workflow is generally great for this, and there is quite a bit of code floating around the web that can be reused for this purpose. Now there is only one underlying HIT, so workers can only complete it once.

The easiest way to do random assignment to condition from within a JavaScript HIT is to have the js assign condition completely at random for each participant. This just involves writing some randomization in the code for the experiment and makes things very simple. With 2 conditions and many participants, this works pretty well (maybe you get 48 in one condition and 52 in another), but with many conditions and fewer participants, it fails quite badly. (Imagine trying to get 5 conditions with 10 participants each. You might get 6, 14, 8, 4, and 18 subjects, respectively, which would not be optimal from the perspective of having equally precise measures about each condition.)

Our solution to this problem is as follows: We use a simple PHP script, the "maker getter," that is called with an experiment filename and a set of initial condition numbers (in the example below, it's "myexpt_v1" and conditions 1 and 2, each with 50 participants). The first time it's called, it sets up a filename for that experiment and populates the conditions. Every subsequent time it's called, it returns a condition. Then, if this is a true Turk worker (and not a test run), a separate script decrements the counts for that condition. This gives us true random assignment to condition.

(Note: Todd Gureckis's PsiTurk is a more substantial, more general way to solve this same problem and several others, but requires a bit more in the way of setup and infrastructure.)

---- DETAILS AND CODE ----

The JavaScript block for setting up and getting conditions:

// Condition - call the maker getter to get the cond variable
try {
var filename = "myexpt_v1"
var condCounts = "1,50;2,50"
var xmlHttp = null;
xmlHttp = new XMLHttpRequest();
xmlHttp.open( "GET", "http://website.com/cgi-bin/maker_getter.php?conds=" +
condCounts +"&filename=" + filename, false );
xmlHttp.send( null );
var cond = xmlHttp.responseText;
} catch (e) {
var cond = 1;
}

The JavaScript block for decrementing conditions:

// Decrement only if this is an actual turk worker!
if (turk.workerId.length > 0){
var xmlHttp = null;
xmlHttp = new XMLHttpRequest();
xmlHttp.open('GET',
'http://website.com/cgi-bin/' +
'decrementer.php?filename=' +
filename + "&to_decrement=" + cond, false);
xmlHttp.send(null);

}

maker_getter PHP script (courtesy of Stephan Meylan, now a grad student at Berkeley), which is running in the executable portion of your hosting space: maker_getter.php.

decrementer PHP script (also courtesy Stephan): decrementer.php.

Friday, October 4, 2013

Effect sizes and baby rearing (review of Baby Meets World)

In these first months of M's life, I've been reading a fair number of parenting advice or interest books focused on babies. My motivation is partially personal and partially professional. Regardless, it has been entertaining to sample the vast array of different theories and interpretations of what is going on in M's cute little head (and body).

I recently finished Baby Meets World: Suck, Smile, Touch, Toddle, by Nicholas Day, and it is my favorite of the scattered group I've read. Day is a clear, funny writer who also blogs entertainingly for Slate. Baby Meets World is a tour of the history and science of parenting, broken down by the four activities in its subtitle.

But unlike many books about developmental science it is also a cry of rage and despair by a new parent who has completely had it with parenting advice. This feels exactly right to me. Rather than urbanely walking through the latest research on sucking along with a Gladwell-esque profile of a scientist, Day shows us the absolute weirdness of its past - from deciding whether to use goats or donkeys as wet nurses to the purported link between thumb sucking and chronic masturbation.

The implication, drawn out very clearly in a recent New York Times blog post, is that our current developmental studies may not have much more to offer parents than Freud's hypotheses about thumb sucking:

... [E]xperiments have the most meaning within [their] discipline, not outside of it: they are mostly relevant to small academic disputes, not large parenting decisions. But when we extract practical advice from these studies, we shear off all their disclaimers and complexities. These are often experiments that show real but very small effects, but in the child-rearing advice genre, a study that showed something is possible comes out showing that something is certain. Meager data, maximum conclusions. (p299)

People often ask me how relevant my own work on language development is to my relationship with M. My answer is, essentially not at all. I am a completely fascinated observer; I continually interpret her behavior in terms of my interest in development. Nevertheless, I see very few - if any - easy generalizations from my work (and that of most of my colleagues) to normative recommendations for child rearing beyond "talk to your child."

While this kind of recommendation is without a doubt critical for some families, it's not necessarily the kind of thing that you need to hear if you're already in the market for baby advice books. For example, rather than telling me that M needs to hear 30 million words, you should probably counsel me to talk to her less (let the baby sleep, already!). One size doesn't fit all. There are some interesting applied studies that have near-term upshot for baby-advice consumers (e.g. work on learning from media). But overall this is the exception rather than the rule in much of what I do, which is primarily basic research on children's social language learning.

Parents who have read parenting books often say "you must do X with your child" or "you can't do Y," whether it's serving refined sugar, giving tummy time, or using the word "no" (don't, do, and don't, respectively - according to some authorities). But the effect size of any child-rearing advice, whether reasonable or not, is likely to be small: the people who had parents that followed it aren't immediately distinguishable from those whose parents didn't. Consider the contrast between the range of variation in parenting practices across cultures and the consistency of having reasonable outcomes - nice, well-adjusted people. People grow up lots of different ways and yet they turn out just fine. This is the message of Day's book.

Of course there are real exceptions to this rule. But these are not the small variations in child rearing for your standard-issue helicopter parents - BPA-free tupperware or not? - or even the culturally-variable practices like whether you swaddle. They are huge factors like poverty, stress, and neglect, which have systematic and devastating effects on children's brain, mind, and life outcomes. Remediating them is a major policy objective. We shouldn't confuse the myriad bewildering details of babyrearing with the necessities of providing safety, nutrition, and affection.

Monday, September 23, 2013

M discovers objects

(M's bouncy chair, complete with newly fascinating objects.)

We've just returned from a trip to see family, in honor of my daughter M's two month birthday, and something very amazing has happened. M has discovered objects.

M has been completely fascinated by faces almost since the beginning of her life (see a previous post on this). She is incredibly social, tracking everyone around her, making eye contact* and smiling. And along with that social fascination came a complete indifference to objects. She couldn't - and still can't -hold onto anything that isn't tightly pressed into her palm (activating the palmar reflex). But it was more than that. She just didn't seem to care if we moved a toy near her. We could maybe get a little rise out of her with a rattle, but any face was enough to provoke her to track attentively.

We noticed a huge change this morning. We put M in her bouncy chair (see photo above), and - just for kicks - snapped on the toy bar that comes with the chair. We had tried this earlier and gotten absolute disinterest from her. But this morning, she was in love! She sat and cooed and bounced for probably around 45 minutes, fixating the toys the whole time. There's a face on one of the toys, though (for extra perceptual oomph, perhaps?). So we repeated the experiment, this time with some non-face toys in her crib. Again, she was mesmerized.

Clearly this general phenomenon is something toy manufacturers know about - or else why would every baby crib or seat come with floating, bouncing toys above it (think the ubiquitous pack-'n-play)? But a shift to an interest in objects around 2 months is something that I haven't read about in the developmental literature. A very interesting sequence of papers documents a shift in attention to faces relative to objects a little later, e.g. around 4 months. In a new eye-tracking study, we reviewed this literature, concluding that

"The evidence on sustained attention to faces is thus consistent across studies: 3-month-olds do not prefer faces in either dynamic displays or static stimulus arrays, while older children [5 - 6 month-olds] show a clear face preference."

But these findings are all about whether faces trump objects when they are put in direct competition. I bet M would show the attention capture that babies in all of these studies show - they look at a salient object and then don't seem to be able to tear themselves away, even when the competitor is a face. What they don't capture is a growth of interest in objects per se, when there are no other competitors.

There are also a bunch of wonderful studies from Jessica Sommerville, Amy Needham, and collaborators examining growth in babies' perception of faces, objects, and intentions based on their abilities to reach and grab (e.g. this instant classic and this newer one). But M can't consistently reach for objects yet, and probably won't be able to for a while now. She also doesn't seem to be swiping for the toys on the bouncy chair. So I don't think the development of reaching is what's driving this change in visual interest.

I'd love to know if anyone has any ideas about whether this phenomenon - a dramatic increase in interest to objects - is something that has been studied before (or even if they've seen it in their own children). Regardless of mechanism, though, it's been a pleasure to watch M as she discovers a whole world of new things to see.

---
* Actually, as Haith, Bergman, & Moore (1977) noted, she did a lot of "forehead contact" early; now that behavior has morphed into something that looks much more like true eye contact.

Tuesday, September 10, 2013

Post-publication peer review and social shaming

Is the peer review system broken? Arguably. Peer review can be frustrating, both in what gets through and what doesn't. So recently there has been been a lot of talk about the virtues of post-publication peer review (e.g. links here, here, and here), where folks on the internet comment on scientific publications after they are public. One suggestion is that post-publication peer review might even one day replace the standard process.

Commenting on papers after they are published has to be a good idea in some form: more discussion of science and more opportunities to correct the record! But I want to argue against using post-publication peer review, at least in its current form, as the primary method of promoting stronger cultural norms for reliable research.

1. Pre-publication peer review is a filter, while post-publication peer review is an audit.

With few exceptions, peer-review is applied to all submitted papers. And despite variations in the process from journal to journal, the overall approach is quite standard. This uniformity doesn't necessarily lead to perfect decisions, where all the good papers are accepted and bad papers are rejected. Nevertheless, peer review is a filter that is designed to be applied across the board so that the scientific record contains only those findings that "pass through" (consider the implicit filter metaphor!).

When I review a journal article I typically spend at least a couple of hours reading, thinking, and writing. These hours have to be "good hours" when I am concentrating carefully and don't have a lot of disruptions. I would usually rather be working on my own research or going for a walk in the woods. But for a paper to be published, a group of reviewers needs to commit those valuable hours. So I accept review requests out of a sense of obligation or responsibility, whether it's to the editor, the journal, or the field more generally.

This same mechanism doesn't operate for post-publication review. There is no apparatus for soliciting commenters post-publication. So only a few articles, particularly those that receive lots of media coverage, will get the bulk of the thoughtful, influential commentary (see e.g. Andrew Gelman's post on this). The vast majority will go unblogged.

Post-publication peer review is thus less like a filter and more like an audit. It happens after the fact and only in select cases. Audits are also somewhat random in what attracts scrutiny and when. There is always something bad you can say about someone's tax return - and about their research practices.

2. Post-publication peer review works via a negative incentive: social shaming.

People are generally driven to write thoughtful critiques only when they think that something is really wrong with the research (see e.g. links here, here, here, and here). This means that nearly all post-publication peer review is negative.

The tone of the posts linked above is very professional, even if the overall message about the science is sometimes scathing. But one negative review typically spurs a host of snarky follow-ons on twitter, leaving a single research group or paper singled out for an error that may need to be corrected much more generally. Often critiques are merited. But they can leave the recipients of the critique feeling as though the entire world is ganging up against them.

For example, consider the situation surrounding a failure to replicate John Bargh's famous elderly priming study. Independent of what you think of the science, the discussion was heated. A sympathetic Chronicle piece used the phrase "scientific bullying" to describe the criticisms of Bargh, noting that this experience was the "nadir" of his career. That sounds right to me: I've only been involved in one, generally civil, public controversy (my paper, reply, my reply back, final answer) and I found that experience extremely stressful. Perceived social shaming or exclusion can be a very painful process. I'm sure reviewers don't intend their moderately-phrased statistical criticisms to result in this kind of feeling, but - thanks to the internet - they sometimes do.

3. Negative incentives don't raise compliance as much as positive cultural influences do.

Tax audits (which carry civil and criminal penalties, rather than social ones) do increase compliance somewhat. But a review of the economics literature suggests that cultural factors - think US vs. Greece - matter more than the sheer expected risk due to audit enforcement (discussion here and here).* For example, US audit rates have fluctuated dramatically in the last fifty years, with only more limited effects on compliance (see e.g. this analysis).**

Similarly, if we want to create a scientific culture where people follow good research practices because of a sense of pride and responsibility - rather than trying to enforce norms through fear of humiliation - then increasing the post-publication audit rate is not the right way to get there. Instead we need to think about ways to change the values in our scientific culture. Rebecca Saxe and I made one pedagogical suggestion here, focused on teaching replication in the classroom.

Some auditing is necessary for both tax returns and science. The overall increase in post-publication discussion is a good thing, leading to new ideas and a stronger articulation and awareness of core standards for reliable research. So the answer isn't to stop writing post-pub commentaries. It's just to think of them as a complement to - rather than a replacement for - standard peer review.

Finally, we need to write positive post-publication reviews. We need to highlight good science and most especially strong methods (e.g. consider The Neurocomplimenter). The options can't be either breathless media hype that goes unquestioned or breathless media hype that is soberly shot down by responsible academic critics. We need to write careful reviews, questions, and syntheses for papers that should remain in the scientific literature. If we only write about bad papers, we don't do enough to promote changes in our scientific standards.

---
* It's very entertaining: The tone of this discussion is overall one of surprise that taxpayers' behavior doesn't hew to the rational norms implied by audit risk.
** Surprisingly, I couldn't find the exact graph I wanted: audit rates vs. estimated compliance rates. If anyone can find this, please let me know!

Monday, September 2, 2013

iPad experiments for toddlers

(Update 3/24/15 – this post is subsumed by a paper we wrote on this topic, available here).
---

The iPad should be a fabulous experimental tool for collecting data with young children: It's easy to use, kids love it, and it's even cheaper and easier to transport than a small Mac laptop. Despite these advantages, there has been one big drawback: creating iPad apps requires mastering a framework and programming language that are specific to iOS, and putting the apps on more than a few iPads requires dealing with the app store. Neither of these are impossible hurdles, but both of them have kept us (and AFAIK most other labs in developmental psychology) from getting very far in using them for routine day-to-day data collection.

This post documents a new method that we have been using to collect experimental data with toddlers, developed this summer by Elise Sugarman, Dan Yurovsky, and Molly Lewis. It is easy to use, makes use of free development tools, doesn't require dealing with the App Store or the Apple Developer Tools, and hooks in nicely with the infrastructure we use to create Amazon Mechanical Turk web experiments for adults.

Our method has four ingredients:

JavaScript/HTML web experiment
Server-side PHP script to save data
iPad with internet connection, in kiosk mode
Kid management on the iPad

1. JavaScript/HTML web experiment.

This is the big one for someone who has never made an online experiment. It's beyond the scope of this post to introduce how to create JavaScript web experiments, but there is a lot of good material out there, especially from the Gureckis Lab at NYU (e.g. this blog post). There are also many tools for learning how to make websites using the now standard combo of JavaScript, HTML, CSS, and jQuery. Note that putting up such an experiment will require some server space. We use the space provided by Stanford for our standard university web pages, but all that's required is somewhere to put your HTML, JS, and CSS files.

2. PHP script

This is a simple script that runs on a server (probably the same one that you are using to serve the experiment). All it does is save data from the JavaScript experiment to a location on the server.

In the JavaScript code, we need to add a bit of code to send the data to this script (making sure jQuery is loaded so we can use ".post"):

if (counter === numTrials) {

$.post("http://lab.example.edu/cgi-bin/expt_dir/post_results.php",
{postresult_string : result_string});

}

And then post_results.php is a simple script that looks like this:

<?php

$result_string = $_POST['postresult_string'];

file_put_contents('results.csv',

$result_string, FILE_APPEND);

3. iPad config

Our method requires that you have an internet connection handy for your iPad. This means that you either need wifi access (e.g. from the testing location or a MiFi) or else you need an iPad with cell connectivity. But aside from that, you're just navigating the iPad to a website you've set up.

We use two tools to ensure that iPad-savvy kids don't accidentally escape from our experiment or zoom the screen into something totally un-navigable. The first is Guided Access, a mode on the iPad that disables the hardware buttons. The second is Mobile Kiosk, an app that further locks the iPad into a particular view of a webpage. The combination means that kids can't get themselves tangled in the functionality of the iPad.

4. Kid management

The last ingredient is training kids to use the iPad effectively. Although many of them will have interacted with tablet devices and smartphones before, that's no guarantee that they can tap effectively and navigate through an experiment. We created a simple page training with a set of dots for a child to tap - they can't advance until they successfully tap all the dots (kind of like Whac-A-Mole).

(Update #1: Elise notes that using a tilting iPad case helps kids click more successfully as well.)

(Update #2: Brock Ferguson gives a handy guide to making the PHP above more secure if you ever want to use it for anything other than preschoolers.)

Sunday, September 1, 2013

Unconscious and non-linguistic numbers?

(Flock of birds, from Flickr)

An image of flock of birds fades almost instantaneously from our visual memory - as Borges described memorably in his Argumentum Orithologicum. But if you get a chance to count the birds one by one, their exact number (78 in this case?) is represented by a symbol that is easy to remember and easy to manipulate. Numbers help us overcome the limitation that without some intermediate representation, none of our memory systems can represent exactly 78 objects as distinct from exactly 79.

For most people who use language to manipulate numbers, mentally representing exact quantities like 78 requires 1) knowing the number words and 2) producing the right number words - speaking them, at least in your head - at the moment you want to represent a corresponding quantity. The evidence:

If your language doesn't have words for numbers (such as several Amazonian tribes), you make systematic estimation errors in manipulating large quantities.
If you are experimentally prevented from using language in the same moment that you need to count (e.g. by verbal interference), you make the same kinds of errors as if you didn't have language for numbers to begin with.

One major exception to this "number requires language in the moment" hypothesis is visual representations of number like the mental abacus. Mental abacus provides a way to use visual memory (rather than auditory/phonological memory) for remembering and - more importantly - manipulating exact quantities. So it's an exception that proves the rule: Like numerical language, mental abacus gives its users a representation scheme for holding exact quantities in mind.

Over the past few years, I've been collecting a few examples that push at the boundaries of this theoretical framework, though:

1. It may be possible to prime arithmetic expressions unconsciously.

A recent paper by Sklar et al. uses a clever method called continuous flash suppression to introduce stimuli to participants while keeping them out of conscious awareness. When shown expressions like "9 - 3 - 4 = " using this method, participants were 10 - 20 ms faster to speak the correct answer (e.g., 2) when it was presented, compared to an incorrect answer. (Incidentally, an odd fact about the result is that the authors had much more trouble finding unconscious priming effects for addition than subtraction. )

I find this result very surprising! My initial thought was that participants might have been faster because they were using their analog magnitude system (indicating approximate rather than exact numerical processes). I wrote to Asael Sklar and he and his collaborators generously agreed to share their original data with me. I was able to replicate their analyses* and verify that there was no estimation effect, ruling out that alternative explanation.

So this result is still a mystery to me. I guess the suggestion is that there is some "priming" - e.g. trace activation of the computations. But I find it somewhat implausible (though not out of the question) that this sort of subtraction problem is the kind of computation that our minds cache. Have I ever done 9 - 3 - 4 before? It certainly isn't as clear an "arithmetic fact" as 2+2 or 7+3.

2. Richard Feynman could count and talk at the same time.

In a chapter from "The Pleasure of Finding Things Out," (available as an article here) Feynman recounts how he learned that he could keep counting while doing other things. I was very excited to read this because I have also noticed that I can count "unconsciously" - that is, I can set a counter going in my brain, e.g. while I hike up a hill. I can let my mind wander and check back in to find that the counter has advanced some sensible distance. But I never systematically tested whether my count was accurate.

This kind of test is exactly what Feynman set out do to. He would start counting, then begin another operation (e.g. doing laundry, walking around, etc.) and check back in with his internal "counter." He tested the accuracy of his count by measuring that he could count up to about 48 in a minute with very little variability when there was no interference. So he would do many different things while counting and check how close his count was to 48 - if there had been interference, he would be off in how far he had counted after a minute had elapsed.

The only thing he found that caused any active interference was talking, especially producing number words:

... I started counting while I did things I had to do anyway. For instance, when I put out the laundry, I had to fill out a form saying how many shirts I had, how many pants, and so on.
I found I could write down "3" in front of "pants" or "4" in front of "shirts: while I was counting to myself but I couldn't count my socks. There were too many of them: I'm already using my "counting machine" ...

What's even more interesting is that Feynman reports that the statistician John Tukey could count and talk at the same time - by imagining the visual images of the numbers turning over. But apparently this prevented Tukey from reading while he counted (which Feynman could do!).

So these observations seem like they are consistent with the hypothesis that exact number requires using a particular set of mental resources, whether it's the resources of speech production (for counting out loud) or of visual working memory (for imagining digits or a mental abacus). But they, along with the Sklar et al. finding - also support the idea that the representation need not necessarily percolate up to the highest levels of conscious experience.

3. Ildefonso, a home-signer without language, learned to do arithmetic before learning to sign.

In A Man Without Words, Susan Schaller describes the growth of her friendship with Ildefonso, a deaf, completely language-less man in his 30s. Ildefonso grew up as a home-signer in Mexico and came to the US as an agricultural laborer. Over the course of a short period working with him at a school for the deaf, she introduces him to ASL for the first time. The story is beautiful, touching, and both simply and clearly written.

Here's the crazy part: Before Schaller has succeeded in introducing Ildefonso to language more generally, she diverts him with single digit arithmetic, which he is apparently able to do handily:

To rest from the fatigue of our eye-to-eye search for an entrance into each other's head, we sat shoulder to shoulder, lining up numerals in progressively neater rows. I drew an addition sign between two 1s and placed a 2 underneath. I wrote 1 + 1 + 1 with a 3 under it, then four 1s, and so on. I explained addition by placing the corresponding number of crayons next to each numeral. He became very animated, and I introduced him to an equal sign to complete the equations. Three minutes later the crayons were unnecessary. He had gotten it. I presented him with a page of addition problems, and he was as happy as my nephew with a new dinosaur book. (p. 37)

It would be very interesting to know how accurate his computations were! This observation suggests that language for number may not critically rely on understanding any other aspects of language. Perhaps Ildefonso didn't even treat numerals as words at all (but instead like Gricean "natural" meanings, e.g. "smoke means fire").

Conclusions

All of these examples are consistent with the general hypothesis described above about the way language works to represent exact numbers. But all three suggest that our use of numbers need not be nearly as conscious or as language-like as I had thought for them to carry exact numerical content.

---

* With one minor difference: Sklar et al.'s error bars reflect the mean +/- .5 * the standard error of the mean (SEM), rather than the more conventional +/- 1 SEM. This is a semantic issue: the full length of their error bar is the SEM, rather than the SEM being the distance from the mean. Nevertheless, it is not standard practice.