As a field, we have bounced back and forth between continuity and discontinuity theories to explain these changes. Continuity theories posit that infants' starting state is related to our end state, and that changes are gradual, not saltatory; discontinuity theories posit stage-like transitions. Behaviorist learning theory was fundamentally a continuity hypothesis – the same learning mechanisms (plus experience) underly all of behavior, and change is gradual. In contrast, Piagetian stage theory was fundamentally about explaining behavioral discontinuities. As the pendulum swung, we get core knowledge theory, a continuity theory: innate foundations are "revised but not overthrown" (paraphrasing Spelke et al. 1992). Gopnik and Wellman's "Theory theory" is a discontinuity theory: intuitive theories of domains like biology or causality are discovered like scientific theories. And so on.
For what it's worth, my take on the "modern synthesis" in developmental psychology is that development is domain-specific. Domain of development – perception, language, social cognition, etc. – progress on their own timelines determined by experience, maturation, and other constraining factors. And my best guess is that some domains develop continuously (especially motor and perceptual domains) while others, typically more "conceptual" ones, show more saltatory progress associated with stage changes. But – even though it would be really cool to be able to show this – I don't think we have the data to do so.
The problem is that we are not thinking about – or measuring – development appropriately. As a result, what we end up with is a theoretical mush. We talk as though everything is discrete, but that's mostly a function of our measurement methods. Instead, everything is at rock bottom continuous, and the question is how steep the changes are.
We talk as though everything is discontinuous all the time. The way we know how to describe development verbally is through what I call "milestone language." We discuss developmental transitions by (often helpful) age anchors, like "children say their first word around their first birthday," or "preschoolers pass the Sally-Ann task at around 3.5 years." When summarizing a study, we* assert that "by 7 months, babies can segment words from fluent speech," even if we know that this statement describes the fact that the mean performance of a group is significantly different than zero in a particular paradigm instantiating this ability, and even if we know that babies might show this behavior a month earlier if you tested enough of them! But it's a lot harder to say "early word production emerges gradually from 10 - 14 months (in most children)."
Beyond practicalities, one reason we use milestone language is because our measurement methods are only set up to measure discontinuities. First, our methods have poor reliability: we typically don't learn very much about any one child, so we can't say conclusively whether they truly show some behavior or not. In addition, most developmental studies are severely underpowered, just like most studies in neuroscience and psychology in general. So the precision of our estimates of a behavior for groups of children are noisy. To get around this problem, we use null hypothesis significance tests – and when the result is p < .05, we declare that development has happened. But of course we will see discrete changes in development if we use a discrete statistical cutoff!
And finally, we tend to stratify our samples into discrete age bins (which is a good way to get coverage), e.g. recruiting 3-month-olds, 5-month-olds, and 7-month-olds for a study. But then, we use these discrete samples as three separate analytic groups, ignoring the continuous developmental variation between them! This practice reduces statistical power substantially, much like taking median splits on continuous variables (taking a median split on average is like throwing away a third of your sample!). In sum, even in domains where development is continuous, our methods guarantee that we get binary outcomes. We don't try to estimate continuous functions, even when our data afford them.
One reason that it's not surprising to see developmental change is that everything that children do is at its heart a skill. Sucking and swallowing is a skill. Walking is a skill. Recognizing objects is a skill. Recognizing words is a skill too - so too is the rest of language, at least according to some folks. Thinking about other people's thoughts is a skill. So that means that everything gets better with practice. It will – to a first approximation – follow a classic logistic curve like this:
But here's the problem. If you have a complex behavior, it's built of simple behaviors, which are themselves skills. To get the probability of success on one of those complex skills, you can – as a first approximation – multiply the independent probabilities of success in each of the components. That process yields logistic curves that look like these (color indicating the number of components):
And samples from a process with many components look even more discrete, because the logistic is steeper!
Given this kind of perspective, we should expect complex behaviors to emerge relatively suddenly, even if they are simply the product of a handful of continuously changing processes.
This means, from a theoretical standpoint, we need stronger baselines. Our typical baseline at the moment is the null hypothesis of no difference; but that's a terrible baseline! Instead, we need to be comparing to a null hypothesis of "developmental business as usual." To show discontinuity, we need to take into account the continuous changes that a particular behavior will inevitably be undergoing. And then, we need to argue that the rate of developmental change that a particular process is undergoing is faster than we should expect based on simple learning of that skill. Of course to make these kinds of inferences requires far more data about individuals than we usually gather.
In a conference paper that I'm still quite proud of, we tried to create this sort of baseline for early word learning. Arguably, early word learning is a domain where there likely aren't huge, discontinuous changes – instead kids gradually get faster and more accurate in learning new words until they are learning several new words per day. We used meta-analysis to estimate developmental increases in two component processes of novel word mapping: auditory word recognition and social cue following. Both of these got faster and more accurate over the first couple of years. When we put these increases together, we found they together created really substantial changes in how much input would be needed for a new word mapping. (Of course what we haven't done in the three years since we wrote that paper is actually measure the parameters on the process of word mapping developmentally – maybe that's for a subsequent ManyBabies study...). Overall, this baseline suggests that even in the absence of discontinuity, continuous changes in many small processes can produce dramatic developmental differences.
In sum: sometimes developmental psychologists don't take the process of developmental change seriously enough. To do better, we need to start analyzing change continuously; measuring with sufficient precision to estimate rates of change; and creating better continuous baselines before we make claims about discrete change or emergence.
In sum: sometimes developmental psychologists don't take the process of developmental change seriously enough. To do better, we need to start analyzing change continuously; measuring with sufficient precision to estimate rates of change; and creating better continuous baselines before we make claims about discrete change or emergence.
---
* I definitely do this too!