While OpenBI, LLC, focuses on delivering business intelligence solutions with open source software, the Open BI Forum has more to do with the absence of restrictions in its charter for chronicling BI. The column will draw from experience in business, information technology, statistics, social science, philosophy and even financial engineering to bring insight on BI. Miller will write on databases, ETL, analytics, statistics and graphics. He will offer commentary on open source licensing, strategy and performance measurement, change management, and program evaluation - attempting to provide insights on business intelligence that cross disciplines, drawing on the strengths of each.
When I first approached a colleague for feedback on the graphs that are the topic of this paper, I was quite surprised by his reactions. He not only critiqued the work, he pretty much leveled it, sparing no disdain to a close friend. The graphics were very 1980s, he noted, and who in BI is doing graphics anyway - live visualization is the standard now. My work was mundane, unsexy, passive; current technology allowed users to interact with their visuals - to rotate, restrict, expand, drill, etc. though live data, providing inestimable opportunity for analysis. The quality of what I presented did not even cut it by 15-year-old spreadsheet standards, he hinted, let alone the exciting visual developments of the last few years.
My bruised ego rationalized the rout as the whimsy of a young colleague trying to impress with strong thinking, so what if it was at my expense? Mine is at least 90s technology, I privately fumed, and visualization technology is for the ADD crowd, anyway. Real thinkers prefer black and white graphics! The critique did force me, however, to rethink the logic of the approach and to revisit my assumptions and working principles. My conclusion, thankfully, is that there is indeed a role for my dated approach and graphics of the 90s.
Graphics for BI
As analytics professionals age, they migrate from equations to graphs, preferring psychology to mathematics. It is no surprise that the average age of participants in visual display guru Edward Tufte's seminars is greater than 40. Both Tufte and William Cleveland (author of the path-breaking books, The Elements of Graphing Data and Visualizing Data) are approaching latter midlife in their careers. Not coincidentally, Tufte and Cleveland are the inspirations for this article and my thinking on graphs for business intelligence.
Any reader of Tufte's The Visual Display of Quantitative Information knows his most important tenet is to maximize the data-ink ratio, i.e., the ink for data only divided by the ink of the total graphic. Tufte is intolerant of ink without a purpose - gratuitous visualization - if you will. His method prescribes iterations in graph/edit, with the edit cycle chartered with cutting excess ink. Cleveland, a statistician, has conducted extensive research on graphical perception, the process of visually decoding information. His mandate? "Make the data stand out. Avoid superfluity." I think it is safe to say that both experts prescribe graphics where the data tells the story with little interference. Both experts would also, I believe, be enthusiastic about active visualization - provided the data were featured, not the pixels.
Cleveland is credited with popularizing the dot plot or chart, a graphic where the vertical axis depicts a labeled characteristic while the horizontal axis details a measurement. In a sense, a dot plot is a horizontal bar chart without the fill-in ink, displaying only points representing the label-measurement intersection. Indeed, Cleveland feels that data in most pie and bar charts would be better served in dot plots, which provide more flexibility with less ink.
Dot plots have several characteristics that are gratifying for quantitative analysis. The individual plots can be readily juxtaposed, or set side by side, to benefit from visual comparison. Variations in the scale of the measurement variable can be used to communicate both within and between graphs. Labels can be sorted for performance to show the ordering of the measurement variable. Dot plot labels themselves may have a distinctive ordering that is meaningful for inspection. There may be groupings of the labels that have meaning and provide insight for analysis. And, finally, we might overlay or superpose labels of measurements to focus comparisons. Comprehensive dot plot functions are available in the commercial math/stat packages S-PLUS and MATLAB, and in the open source stats/graphics R module.
To illustrate these characteristics using R, I present a collection of dot plots that detail performance of stock portfolios over a three-year period. There are 23 indexes that serve as vertical labels, with the growth of an invested $1 over time as the measurement. The portfolios can be dimensioned by size and value. Three measurement time frames include year to date, one year and three years. A challenge is to see if any of the dot-charting techniques shed analytical light on portfolio performance. What follows are illustrations of each of the outlined dot plot strategies - juxtaposition, scale, performance, ordering, groupings and overlay - that will hopefully help provide analytical insight into portfolio return performance.
Figure 1 shows simple juxtaposition, detailing side-by-side charts for year to date, one year, and three-year performance. These graphs hint that performance for the portfolios has been somewhat consistent over the time frames measured, though they provide little insight on the nature of patterns based on size and value. Note the differences in the scales of the x-axes of the charts, which, in this case, were software-determined. The scale of the third graph, which is so much higher than the first and second, hints at strong performance between years one and three.