Babies Learning Language: Fixing the axis labels

I've waited a long time for an experiment to finish and I eagerly sit down to begin my analysis. I've got a graph in mind that shows the relationship between the measure of interest and the key manipulations. I code frantically until I get the first look at the plot and then slump back onto the couch in disgust, because it doesn't look like what I imagined.

This scenario has happened to me more times than I can count, and it's one of the formative experiences of doing experimental science. If you always see the pattern of data you expect, you're doing something wrong. The question is what to do next.

When I teach data analysis, I focus on visualization and exploration rather than statistics. You have to become familiar enough with whatever graphics package you use (R, matlab, excel, etc.) that you can quickly make and alter many different visualizations of your data. But once you gain that kind of facility it becomes tempting to throw off dozens of different graphs. After seeing that first failure, the next obvious move is to try all kinds of fancy new analyses. Add a factor to the graph. Break it up by items or by subjects. Try difference scores. Make it 3D!

These are all important things to do - and even more important things to be able to do. (Well, maybe not the 3D part.) If, for example, you can't quickly do a visual subject or item analysis, then you may miss crucial issues in your design. But I also want to argue that the immediate move to more complex or more in-depth analyses may lead to post-hoc analyses, where you find results that you didn't predict. From there, it's very tempting to interpret these findings as though they are the planned analysis. Problems of this sort have been discussed intensely in recent years.

That first graph you made is important (or it should be, if you've chosen the correct starting place). That graph should be the planned analysis - the one that you wanted to do to test your initial hypothesis. So instead of poking around the dataset to see if something else turns up, what I try to do is something I call fixing the axis labels. (I use this as a shorthand for "cleaning up all the seemingly unimportant details"). Fixing the axis labels is an important way to get as much information as possible from your planned analysis.

When fixing the axis labels, take the time to walk through the graph slowly:

Label the axes appropriately, with descriptive names,
Make sure the scale make sense, adding units wherever possible,
Correct the ticks on the axes so that they are sensible in terms of placement and precision,
Fix the aspect ratio of the graph so that the measures are scaled appropriately,
Make sure there are appropriate measures of variability, ideally 95% confidence intervals so you can do inference by eye, and
Make sure that the appropriate reference lines are visible, e.g. a line indicating chance performance or a baseline from a control task.

You can see an example of this cleanup in the frontispiece image above, a simplified plot from a project I worked on a couple of years ago. Although I try to avoid bar graphs in most of my work, I've chosen one here because even in this very simple visualization it's possible to add a lot of extra important detail that helps any viewer (including me!) interpret the data.

Some or all of this may seem obvious. Yet it is astonishing to me how many students and collaborators move on from that first graph before perfecting it. (This move often stems from anxiety that my experiment has failed - prompting a search for new analyses that "worked," often exactly the wrong move.) And while sometimes fixing the axis labels simply makes a clear failure clearer, other times it can reveal important insights about the data:

Axes. Does the relationship being plotted really make sense with respect to the design of the experiment? Can you describe the axes in terms of what was manipulated (typically horizontal) and what was measured (typically vertical)? If not, then you need a different plot.
Scale and aspect ratio. Is the measurement magnitude appropriate and sensible? You can only see this if the scale is right and the aspect ratio is appropriate. But this simple check of magnitudes is an important way to catch errors. (I made this error in one of my first papers, where I plotted minima rather than means and failed to notice that infant looking times were a full order of magnitude smaller than they should have been. Not my finest hour.)
Ticks and reference marks. Does the approximate level of participants' performance make sense? How does it compare to known baselines like chance or performance in earlier studies?
Variability. Is the variability sensible? Does it vary across conditions or populations? Is the precision of the measurements sufficient to allow for the inferences you want? Often the appropriate variability measure is all you need to make a strong argument from the statistical data on a graph. The later statistical analyses may confirm and quantify the impression of reliability from the visualization, but they become much less critical.

Of course it's important to do many different exploratory visualizations. But before moving on to these secondary hypotheses, take the time to make the best representation you can of the primary, planned analysis. Make the graph as clean and clear as possible, so you can walk a friend or collaborator through it and see the nature of the measurements you've made.

(HT: lots of these recommendations are inspired by the work of Andrew Gelman and colleagues, e.g. this.)

Babies Learning Language

Tuesday, August 6, 2013

Fixing the axis labels

No comments:

Post a Comment