Every time the issue gets discussed on twitter, I get a little bit rant-y; this post is my attempt to explain why. It's not because I fundamentally disagree with the argument. Barplots do mask important distributional facts about datasets. But there's more we have to take into account.
Here's my basic argument:
Hey #barbarplots folks: I agree with you that plotting variability is important, but the world of data is big! /1— Michael C. Frank (@mcxfrank) August 10, 2016
Sometimes you need a summary stat because you have lots of observations, sometimes because there are many conditions to compare. /2— Michael C. Frank (@mcxfrank) August 10, 2016
Bars are overused when neither of those apply; they shouldn't always be the default. Lines and points are usually cleaner. /3— Michael C. Frank (@mcxfrank) August 10, 2016
When I originally posted that rant, I was in transit and didn't get a chance to illustrate my point, so there was a lot of back-and-forth about what good use cases for bars would be.* The basic one that comes to mind for me is in analyzing datasets where there are many discrete independent variables (e.g., conditions, experiments) and not many observations per participant. This structure describes many experiments I've worked on.ANOVA also overused and used inappropriately - would be silly to ban it. Same for bars. Move the defaults and #barwithcaution. /end— Michael C. Frank (@mcxfrank) August 10, 2016
I put together an example visualization, based on Experiment 4 of this paper. All code and data here, in the experiment 4 analysis script. Here's the plot we put in the paper:
I chose a barplot because there were a lot of planned age groups and conditions and it seemed like an easy way to represent that discrete structure in the data, along with summary means and 95% CIs. I like to visualize by-subject distributions (I was actually a bit fetishistic about it in my early papers), but the data I was plotting here had only four observations per child. As a result, simple jitter plots look crazy:
And box plots are useless.
Violin plots are useless too.
The best alternative I saw was this one, but it still looks too sparse to me:
Having posted these to twitter along with the data, TJ Mahr rose to the challenge to do better:
@annemscheel @mcxfrank hmmm didn't realize only 4 trials per cell. not much y variability to be revealed by points. pic.twitter.com/bLaRixZV4q— tj mahr (@tjmahr) November 4, 2016
I like this representation, and with some tweaking it might be a nice alternative to put in a paper like the one I wrote.
But here's my point: these visualizations are good for different things. The barplot is simple and easy to read – and it compresses well. (This point is made by Heer & Bostock, 2010, as well). Consider what happens when we shrink the plotting space for these (using my version of TJ's so I can hold image size constant):
Or even tinier:
My sense is that the barplot holds up to compression much better, at least modulo the font size. In addition, I would never show the jitterdodge masterwork to a popular audience (or even really to a class). It's just got too much going on.
My broader point: banning particular data analyses or visualizations just doesn't seem like the right answer. Particular visualizations can be right for certain contexts, for certain audiences, and for certain data types. The world of data is broad. We can change the defaults, but we shouldn't ban something that has important uses.
---
* Everyone in the discussion agrees that bars are fine for visualizing single discrete values, e.g. as in the counts in a histogram.
First, we all agree about changing the default, so that's a great start. Barplots have too many problems (see links at the end). It's just easier to catch attention with #barbarplots than #thinkabouttheviolins or #boxesoverbars or the previously used #showyourdistributions. I'm thrilled it reached so many people, by the way, and we're now rethinking the way we present data.
ReplyDeleteI have some specific issues with the present example:
- I think implicitly we interpret bars as signifying some roughly normally distributed data around the displayed mean with a variance indicated by the error bar. In some ways barplots are a graphical equivalent of an independent samples t-test (if it were paired, I'd expect a bar of a difference score because that's what the t-test is based on, too).
In that case, I'd call the barplot misleading in this specific case, because when looking at raw data we don't see the expected distribution at all, quite the contrary.
- Visual perception literature shows that size/surface are visually very salient, we interpret larger bars as more important. But systematic below chance performance doesn't necessarily mean kids performed worse here (they might follow strategies, something I've encountered before in a picture selection task, to my own great surprise thanks to lots of barplots). Plus, on chance performance looks "better" than below chance, which is also not always the most accurate interpretation. It's not the case that those participants knew less, for example.
- If we were to choose figures by how readable when compressed, are there any alternatives to filled boxes in general? Then all line plots need to go, which isn't good news for eye tracking and ERP papers. The same holds for scatterplots, so correlations and meta-analyses are in trouble, too. I'm not sure how robust a criterion this would be.
A filled boxplot with an emphasized median line, by the way, has many of the visual properties of barplots and might hold up to compression similarly well.
To respond to the last point:
For teaching, I'd actually prefer to use a representation of the raw data. Students and fellow scientists form other fields often have no clue what our data look like. I think it's important to communicate the variability and distribution of our data better instead of making a "neat" impression.
In outreach contexts, I'd not present as much data to begin with to stick with a simple message. But that depends on the audience, and the goal of the talk.
Some more information:
http://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.1002128
https://cogtales.wordpress.com/2016/06/06/congratulations-barbarplots/
Thanks for the comments, Christina. I have some specific responses to your points about compressibility and interpretation of the vertical scale, but there's a broader point that maybe I didn't make so well in the (very hastily written) post.
DeleteI started out my career by being a data viz fetishist who wanted to get *all* the data into every plot. My first first-author paper (Frank et al., 2008, Cognition) has literally every participant's individual trial data plotted in the figures. But as I have presented to a broader range of audiences, I've come to believe that there is another principle that has to be balanced with informativity: namely, simplicity. Visualization in my mind is about comparison, argument, and storytelling. Adding more complex representations of the data can add to the evidential value of a graph but it can also detract by creating a representation that prevents comparison. For example, Sebastian Sauer's plot (https://twitter.com/sauer_sebastian/status/794685342947430401), while pretty, totally obfuscates the mean level of performance, making condition-to-condition comparisons extremely difficult.
More generally, except in relatively straightforward experiments (like most of the ones we have been talking about), it's impossible to show all the data. You have to pick and choose a view of the data based on the kinds of comparisons you want to highlight. If you emphasize distribution, you decrease the ability to compare means. Now, in a two-condition experiment, showing means *and* distributions isn't too crazy. But in a 20 condition experiment, it would be. Or imagine that my experiment had *one* datapoint per subject rather than four. Now it would be kind of silly to show the scattered dots - more a reminder of the kind of data that got collected than any value in the visualization.
In sum, what I'm arguing is that you have to pick and choose elements of a visualization, and those choices should be based on the story you want to tell and the evidence you want to muster to support your case. The default of "always show the bars" doesn't conform to that ideal very well - but neither does "never show the bars"! It simply depends on how much data you have, what aspects of the data you feel are necessary to show to make your case, etc.
---
Minor stuff: 1) I agree about "below chance" signaling something other than "worse" sometimes (e.g., here it signals use of mutual exclusivity). But the vertical mapping is responsible for that, not the area of the bars. 2) compressibility - lines are better than dots! bars *and* lines are excellent at being visible from further away, because they create shapes and contours. Tufte has written about this, proposing sparklines e.g. as a very highly compressed representation.
Thanks for this post. It is wonderful that there is discussion about our standards of plotting, which is really the core aim of #barbarplots.
ReplyDeleteOne central aspect your post touches upon is simplicity, which is I feel a point worth elaborating on. It's actually a bit ironic that we at #barbarplots favored, in the choice of our hashtag, simplicity and provocativeness over complexity and accuracy (and thus did not use #thinkabouttheviolins or #usebarplotsthoughtfully; certainly a decision not immune to criticism), and that we are nevertheless arguing against barplots in order to favor complexity and accuracy over simplicity and provocativeness. It is ironic but also telling, because there are obviously arguments for both of these extremes, and that this depends on context and audience, as you rightly point out.
I would, though, argue that the context in which we represent data to a scientific audience (i.e., in your article) is not the place for simplification. I think it is GREAT that we can see where the data is coming from in TJ's version of the graph. It seems to me even especially great for classrooms - sure, it will take students a bit longer to spot the mean differences, but that will likely lead to a better understanding of where these differences are coming from, including an understanding of sample and trial sizes in developmental studies.
As I mentioned, we chose our oversimplifying hashtag, since our primary aim was to raise awareness of a problematic default. So by analogy (even if it might be a somewhat far-fetched one), I would argue that the easy-to-read features of a barplot in the case of your data would be the method of choice e.g. in a situation where you'd present your data to policy makers that just need to understand that your intervention changes outcomes.
Thanks, Sho - I generally agree with you!
DeleteYour analysis of when and where to use the underlying data seems generally right to me, except that I think you guys are on the far edge of your abilities to perceive and understand visualizations, since you work with them all day. Even in a department like mine, I think you would find that many people would be momentarily confused by TJ's (generally very nice) graph. You would have to take extra time to walk people through it. Sometimes that time would be justified by the way it helped them understand the underlying measurements and their variation - but not always!
Anyway, thanks again, and I think we generally agree.