I was trolling the Social Science Statistics Blog of Harvard's Institute for Quantitative Social Science Research last weekend looking for some new ideas for BI, when I came across an article by a couple of political scientists entitled “Using Graphs Instead of Tables in Political Science.” Accompanying the paper is a website that details the nitty gritty of the techniques the authors promote in their writing.
Though the examples are from academic political science, the discussion is quite pertinent for business intelligence as well. The focus is not on the bland BI graphics of stoplights, speed-dials and 3-D pie/bar charts. Rather, it's on visualizing the results of statistical analysis – a topic more and more pertinent to BI as data mining/statistical learning become mainstream tools.
The authors' approach was to examine every article from five issues of three leading political science journals, noting the use of tables and graphs in each. They discovered that political scientists rely on tables far more than graphs and that they never use graphs to present the findings of their models. The authors then set out to “demonstrate directly how researchers can use graphs to improve the quality of empirical presentations.” The accompanying website shows how they would have used visuals to either replace or supplement the tables that appeared in the articles they reviewed.
Figures 1 and 2 from the website (editor’s note: link at left also takes you to subsequent figures mentioned in the blog) demonstrate the use of graphics to display single and multi-variable frequencies/percentages. The simple Cleveland dotplot can be a very effective means of showing percentages. Note the sorted order of the categories by percentage and the scale that starts at zero. With the size of the rectangles indicating frequencies, Mosaic plots are ideal for cross-tabs. The differences between proportional and majoritarian electoral systems with left and right leaning governments are clearly communicated.
Figure 3 combines information on the means and standard deviations of multiple scales in a single plot ordered from lowest to highest mean. Political scientists are much more facile with standard deviations than I now am. I would have used the 5th and 95th percentiles along with min and max instead.
Figure 4 combines means, percentages and measures in a stacked dotplot/violin box and whiskers plot, consolidating much disparate information in a small space. In contrast to means and standard deviations, which tacitly assume a normal distribution, the violin plots let the data tell its own story. Their funky shapes suggest that the distributions are decidedly non-normal. An alternative to violins I often use is kernel density plots.
Figure 5 is a personal favorite, combining the simplicity of a dotplot with the power of grouping and lattice. With lattice, all panel axes are the same to ease cross dimension comparisons. And grouping by time facilitates before-after analyses.
The first of the “Graphs that do not appear in paper,” presents a color-coded correlation matrix, with positive correlations depicted in blue, negative in red. While the visual is appealing and the code behind it illuminating, I prefer the pairwise scatterplot function splom (scatter plot matrix) instead. With splom you can tell if two of the variables have a non-linear relationship.
Finally, with the small multiples dotplot, the authors juxtapose majoritarian with proportional on a number of measures, using the plotting language to assure that scales are comparable across the separate graphs. They also highlight the mean values of the individual measures with vertical lines, dramatically contrasting the differences between majoritarian and proportional electoral systems.
Overall, I'm a big fan of unsexy visuals like those presented in this work, finding them superior to tables and “ink-intensive” graphics. Indeed, if the choice is between 3-D sizzle or substance of presentation, I'll take the latter every time. Following the R programming motif, I routinely communicate the results of modeling work using R's extensible lattice package for dimensional graphics. My take is that presenting the predictions of models visually is just as important as providing statistical validation of goodness of fit.
The authors of this article do a pretty good job demonstrating how to use graphics to better serve statistical analysis. The visuals in this paper, however, are just an introduction to what's available in R. For those looking to advance with R graphical programming, I'd suggest “Lattice” and “R in a Nutshell.” A humorous, tongue-in-cheek response to this article can be found in “Why Tables are Really Much Better than Graphs.”