DM Review would like to welcome new online columnist, Steve Miller, a founder in Chicago-based business intelligence (BI) services firm OpenBI, LLC.
While OpenBI, LLC, focuses on delivering business intelligence solutions with open source software, the Open BI Forum has more to do with the absence of restrictions in its charter for chronicling BI. The column will draw from experience in business, information technology, statistics, social science, philosophy and even financial engineering to bring insight on BI. Miller will write on databases, ETL, analytics, statistics and graphics. He will offer commentary on open source licensing, strategy and performance measurement, change management, and program evaluation - attempting to provide insights on business intelligence that cross disciplines, drawing on the strengths of each.
When I first approached a colleague for feedback on the graphs that are the topic of this paper, I was quite surprised by his reactions. He not only critiqued the work, he pretty much leveled it, sparing no disdain to a close friend. The graphics were very 1980s, he noted, and who in BI is doing graphics anyway - live visualization is the standard now. My work was mundane, unsexy, passive; current technology allowed users to interact with their visuals - to rotate, restrict, expand, drill, etc. though live data, providing inestimable opportunity for analysis. The quality of what I presented did not even cut it by 15-year-old spreadsheet standards, he hinted, let alone the exciting visual developments of the last few years.
My bruised ego rationalized the rout as the whimsy of a young colleague trying to impress with strong thinking, so what if it was at my expense? Mine is at least 90s technology, I privately fumed, and visualization technology is for the ADD crowd, anyway. Real thinkers prefer black and white graphics! The critique did force me, however, to rethink the logic of the approach and to revisit my assumptions and working principles. My conclusion, thankfully, is that there is indeed a role for my dated approach and graphics of the 90s.
Graphics for BI
As analytics professionals age, they migrate from equations to graphs, preferring psychology to mathematics. It is no surprise that the average age of participants in visual display guru Edward Tufte's seminars is greater than 40. Both Tufte and William Cleveland (author of the path-breaking books, The Elements of Graphing Data and Visualizing Data) are approaching latter midlife in their careers. Not coincidentally, Tufte and Cleveland are the inspirations for this article and my thinking on graphs for business intelligence.
Any reader of Tufte's The Visual Display of Quantitative Information knows his most important tenet is to maximize the data-ink ratio, i.e., the ink for data only divided by the ink of the total graphic. Tufte is intolerant of ink without a purpose - gratuitous visualization - if you will. His method prescribes iterations in graph/edit, with the edit cycle chartered with cutting excess ink. Cleveland, a statistician, has conducted extensive research on graphical perception, the process of visually decoding information. His mandate? "Make the data stand out. Avoid superfluity." I think it is safe to say that both experts prescribe graphics where the data tells the story with little interference. Both experts would also, I believe, be enthusiastic about active visualization - provided the data were featured, not the pixels.
Cleveland is credited with popularizing the dot plot or chart, a graphic where the vertical axis depicts a labeled characteristic while the horizontal axis details a measurement. In a sense, a dot plot is a horizontal bar chart without the fill-in ink, displaying only points representing the label-measurement intersection. Indeed, Cleveland feels that data in most pie and bar charts would be better served in dot plots, which provide more flexibility with less ink.
Dot plots have several characteristics that are gratifying for quantitative analysis. The individual plots can be readily juxtaposed, or set side by side, to benefit from visual comparison. Variations in the scale of the measurement variable can be used to communicate both within and between graphs. Labels can be sorted for performance to show the ordering of the measurement variable. Dot plot labels themselves may have a distinctive ordering that is meaningful for inspection. There may be groupings of the labels that have meaning and provide insight for analysis. And, finally, we might overlay or superpose labels of measurements to focus comparisons. Comprehensive dot plot functions are available in the commercial math/stat packages S-PLUS and MATLAB, and in the open source stats/graphics R module.
To illustrate these characteristics using R, I present a collection of dot plots that detail performance of stock portfolios over a three-year period. There are 23 indexes that serve as vertical labels, with the growth of an invested $1 over time as the measurement. The portfolios can be dimensioned by size and value. Three measurement time frames include year to date, one year and three years. A challenge is to see if any of the dot-charting techniques shed analytical light on portfolio performance. What follows are illustrations of each of the outlined dot plot strategies - juxtaposition, scale, performance, ordering, groupings and overlay - that will hopefully help provide analytical insight into portfolio return performance.
Figure 1 shows simple juxtaposition, detailing side-by-side charts for year to date, one year, and three-year performance. These graphs hint that performance for the portfolios has been somewhat consistent over the time frames measured, though they provide little insight on the nature of patterns based on size and value. Note the differences in the scales of the x-axes of the charts, which, in this case, were software-determined. The scale of the third graph, which is so much higher than the first and second, hints at strong performance between years one and three.
Figure 2 applies a common scale to the x-axes across the charts from Figure 1. A common scale can be particularly useful, for example, if the different graphs depict advancing time periods with a comparable measurement variable such as portfolio return. One gets a very strong sense of the cumulative returns with this illustration, because the same scale is used for each graph. Time is certainly the ally of investors, at least according to these examples. Though the common axes can help standardize the perception across charts, they might in some instances also dampen the natural variation that is present in the individual graphs, especially with a multiplicative concept such as portfolio return. A change in scale to logarithms is often helpful in flattening the skewness, if desired.
Figure 3 sorts the portfolios according to year-to-date performance. The left performance-sorted graph has special visual appeal. The bottom left to upper right patterns again hint at consistency in performance over time, with little additional insight on portfolio size and value.
Figure 4 orders the portfolios by size, with those of larger capitalization companies at the bottom and smaller companies at the top. The pattern of lower left to upper right confirms that smaller company portfolios have outperformed their larger cousins pretty consistently over the three-year course of measurement, though the effect of the value dimension is not addressed. Had this graph been imposed on a common measurement scale, the over-time consistency might have been obscured.
Figure 5 groups the portfolios on a growth-neutral-value dimension, ordering by portfolio size within groupings. The pattern of small outperforming large is clearly evident here, across each of the three dimension values. Between value categories, however, the results are a bit more complicated. Value trumps growth for all size portfolios at three years, but at one year and year to date, small growth beats small value, while large value exceeds large growth. There is less variation in the performance of value portfolios across all time periods than there is with growth. In addition, there appears to be a small-large division for each value dimension in the one- and three-year returns. We could add a second dimension variable to show the effects more clearly, if desired.
Figure 6 uses overlays to detail the relative performance of growth and value across portfolios ordered by size. With portfolio size increasing from the bottom, the pattern of lower right to upper left clearly confirms the size finding across all three time points. The interaction of time, value and size also comes across effectively with the overlays. At three years, value trumps growth for all portfolio sizes (this effect is even more pronounced for five- and 10-year returns), while for one year and year to date, growth exceeds value for smaller portfolios, and value trumps growth for large. Differences in performance between value and growth are more marked with larger-cap portfolios.
As hopefully demonstrated, the dot plot, an ugly duckling in the visual intelligence world, can be a very effective means of communicating performance. It satisfies the design mandate of simplicity that features the data with minimal ink. The dot plot design is inherently spartan and perceptually easy to grasp, yet can be enhanced by juxtaposition, scale, ordering, grouping and overlay to tell a more complicated story involving additional factors. Dot plots tell a simple story simply, but can bulk up for heavier analyses on demand.
While certainly not an interactive tool for the advanced analyst, I believe there's a suitable use for dot plots in business intelligence. The dot plot's simplicity and perceptual clarity suggest a role with performance management scorecards, where charts detail constantly-updated intelligence from the warehouse to both mid and senior-level leadership who range in analytical sophistication. The graphs could either be 1) generated by a series of scripts that post the results on the network, or 2) available on demand through a Web-based application. For companies that are just getting started with BI especially, the use of graphics capabilities of open source tools such as R for performance management can be a great way to establish early success with minimal investment.
A subsequent column will continue the discussion of effective perceptual graphics for BI. Special focus will be on the power of panel or Trellis graphs that display relationships dimensioned by one or more conditioning variables.