tl;dr: Introducing and organizing a ManyLabs study for infancy research. Please comment or email me (mcfrank (at) stanford.edu) if you would like to join the discussion list or contribute to the project.
Introduction
The last few years have seen increasing acknowledgement that there are flaws in the published scientific literature – in psychology and elsewhere (e.g., Ioannidis, 2005). Even more worrisome is that self-corrective processes are not as fast or as reliable as we might hope. For example, in the reproducibility project, which was published this summer (RPP, project page here), 100 papers were sampled from top journals, and one replication of each was conducted. This project revealed a disturbingly low rate of success for seemingly well-powered replications. And even more disturbing, although many of the target papers had a large impact, most still had not been replicated independently seven years later (outside of RPP).
I am worried that the same problems affect developmental research. The average infancy study – including many I've worked on myself – has the issues we've identified in the rest of the psychology literature: low power, small samples, and undisclosed analytic flexibility. Add to this the fact that many infancy findings are never replicated, and even those that are replicated may show variable results across labs. All of these factors lead to a situation where many of our empirical findings are too weak to build theories on.
In addition, there is a second, more infancy-specific problem that I am also worried about. Small decisions in infancy research – anything from the lighting in the lab to whether the research assistant has a beard – may potentially affect data quality, because of the sensitivity of infants to minor variations in the environment. In fact, many researchers believe that there is huge intrinsic variability between developmental labs, because of unavoidable differences in methods and populations (hidden moderators). These beliefs lead to the conclusion that replication research is more difficult and less reliable with infants, but we don't have data that bear one way or the other on this question.