In our inaugural OpenBI Forum column of April 27, 2006, we discussed the utility of the common and unsexy dot plot for business intelligence (BI). After first acknowledging the seminal contributions of Tufte and Cleveland - let the data tell the visual story - we settled on juxtaposition, scale, performance, size, grouping and overlay as critical considerations for the construction of effective dot plots. We then illustrated the concepts with real-world examples. This column expands on that thinking using a similar format to discuss dimensional graphics - visuals that examine relationships between variables x and y conditioned on the values of another attribute (or set of attributes) z. The z attributes are in fact the dimensions or panels by which we wish to view the relationship between x and y. Used judiciously, we feel that dimension graphics can be an invaluable adjunct for performance management, providing a foundation for the evaluation of company performance.
The cornerstone of our dimensional thinking is the work on Trellis graphics by William Cleveland, a statistician with Bell Labs in the 80s and 90s (see http://stat.bell-labs.com/wsc/). Subsequent interest has spawned other names for the Trellis approach, including panel, lattice and small multiples. Trellis graph patterns, reminiscent of a garden trellis, depict relationships between two variables (x and y) conditioned on one or more dimensional attributes, and derived from the need to visually investigate complex multivariate relationships with a response variable. Each panel of the trellis represents the graph of a relationship between x and y (the response variable) for a single combination of the conditioning variables. An example of reasonable trellis candidate is the relationship between cholesterol level (x) and the odds of cardiovascular disease (y), conditioned on age category, sex and race. The trellis in this case would depict a separate panel graph for each combination of age category, sex and race that occurred in the data. The individual plots can be many types, including scatter, xy plot, curve plot, quantile, strip plot, and dot plot. A defining characteristic of the Trellis approach is that of common scales for each panel in the overall graph, thus allowing consistent visual comparison between panels. Along with the commonality in panel scales is the flexibility to control the layout of panels to rows, columns and pages, promoting clear graphical comparisons.
In addition to Trellis graphs, we deploy other techniques to help achieve a dimensional effect. Ordering attributes by value levels can promote dimensionality. Superpositioning or overlay, where multiple graphs are combined in a single panel, is a useful dimensional approach, as is a strategic use of colors and shadings to denote different attribute levels. 3-D graphs, often abused in practice, can be invaluable in establishing multivariate patterns. And finally, sometimes opposing views of the same graph, perhaps differing only in plot shading, can provide dimensional insight.
There are several commercial graphics-only packages that support many of the concepts of dimensional graphics detailed here. Tableau Software Version 2.0 supports dimensionality, as does ADVIZOR Solutions Visual Discovery and Spotfire DXP. The commercial statistical package S-Plus from Insightful, and its open source half-brother, R, provide support for all concepts presented in this column and offer extensible programming and data management capabilities as well as comprehensive sets of statistical graphics procedures. The programs and graphs produced for this column were developed in freely available R using the lattice and scatterplot3d libraries. In R as in S-Plus, graphics programming consists of scripts housing straightforward calls to functions with extensive parameter options. The supporting data structures are readily built with R functions and language.
The data used to demonstrate the dimensional concepts consists of stock portfolio returns derived from work at the Center for Research on Security Prices (CRSP) at the University of Chicago and readily accessible from the Web site of Ken French, professor of finance at the Tuck School of Business, Dartmouth College http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/index.html. Professor French, often in tandem with Eugene Fama, professor of finance at the University of Chicago Graduate School of Business, has made seminal contributions to the field of financial economics over the last 30 years. We use data from two files downloaded from French's site for this column: "6 Portfolios Formed on Size and Book-to-Market (2x3)," and "25 Portfolios Formed on Size and Book-to-Market (5x5)." Each file now contains 80 years of portfolio return data. For each file, the x and y variables are date and monthly percentage portfolio return respectively; the dimensions are size, book-to-market value, and weighting. Our charter is to use dimensional graphing techniques to shed light on the performance of these portfolios measured as wealth accumulation or growth of an initial $1 investment over time. We settle on examining the latest 10-year returns for each portfolio reviewed, contrasting performance over time by dimensions weighting, size and book-to-market-value, searching for snippets of intelligence.
A basic lattice xy plot of the 2x3 data is given in Figure 1a. For this illustration, we limit attention to cap-weighted returns. There are six panels in the graph, one for each size-value combination. As is hopefully apparent from the panel strips, the first row represents the latest 10-year returns of large portfolios; the second the returns of small. Similarly, the first column represents growth portfolios, the second neutral and the third value. Note the common scales for each panel, promoting easy comparison across portfolios. Note also the bubble for each growth portfolio reflecting, not surprisingly, the whims of the Internet frenzy time period. These 10-year returns suggest that value trumps growth and small trumps large - findings perhaps coincidental here but consistent with established wisdom. Figure 1b modifies the layout to one row by six columns, offering a second viewpoint for comparison. It is a straightforward parameter change to modify panel layouts across rows, columns and pages, allowing different juxtapositions of the data. Though we show only the current 10-year returns, the functions developed can produce return graphs for any time frames requested.
Investors often ask whether they are better served with cap-weighted or equal-weighted portfolios. Figure 2a uses superposition or overlay to combine multiple graphs per xy plot panel, showing a comparison of cap and equal-weighted returns for each size-value portfolio combination. These graphs, by no means conclusive, suggest that cap-weighting might be preferable for growth portfolios, while equal-weighting is superior for value, especially small value. In contrast to the standalone, cap-weighted Figure1a, note how the addition of equal-weighting has stretched the scales of the y-axes, with the effect of dampening the variation in each panel. The constancy in scales across panels in lattice graphs is thus a double-edged sword: promoting easy comparisons across, but potentially muting differences in scale within from a few extreme values. Also, note the ratio of height to width for each panel, called the aspect ratio, which can be modified for perceptual effect. Figure 2b illustrates another feature of lattice graphics, the easy ability to rotate the dimension variables for a different slicing of the data. Figure 2b contrasts large and small portfolios within weighting and value dimensions, demonstrating the superiority (for this time frame at least) of large over small for growth portfolios, but of small over large for value - findings that are consistent for both cap and equal weighting. We could just as easily demonstrate size graphs within panels of value and weighting.
We raise the dimensional stakes with Figure 3, deploying the 5x5 data set in all its splendor and complexity. This xy plot conditions on value within weighting, using superpositioning of a heatmap-like gray shading to show gradations of size from small (black) to large (lightest gray). We are admittedly pushing the perceptual envelope here with five shades of gray, but feel this graph does a reasonable job of simply representing the complexity of 50 graphs in a single-page visual. Note the dominance of small over large for both equal and cap-weighted value portfolios, but the seeming inversion of this pattern for growth. As was the case with 2x3, the prominent small value equal-weighted graph mutes some of the differences in the other panels.
Figure 3: 5x5 Data Set We are generally critical of 3-D bar charts because the dimensionality is most often gratuitous, representing in three dimensions what should be done in two. However, Figure 4, which shows two views of the same graph of cap-weighted returns, has as dimensions both size and value on the x and y axes respectively, using gray-shading to further elucidate size on the left graph and value on the right. Unlike the xy plots previously discussed, these charts do not show the performance journey, only the final destination. A grid along the z axis might help facilitate interpretation.
Figure 4: Two 3-D Bar Chart Views of Cap-Weighted Returns Finally, we return to our 2x3 data set to show three-way dimensionality with a bar chart in Figure 5, displaying as above size and value on the x and y axes, but representing weighting with gray-shading. Unlike Trellis, the 3 separate plots on this page all have distinct z axes, reflecting the wide variation in final destination returns over 10, 15 and 20 year time frames. Again, the trade-off for the independent graphs is the absence of consistency across individual scales for a more readable format. A grid along the z axis might help facilitate interpretation in this case as well.
Figure 5: 3-D Bar Chart Returns Over Time This column has hopefully given readers a flavor of the value of dimensional graphics, particularly the lattice approach, for business intelligence. Dimensional visual designs can clarify complex multivariate relationships, simplifying analytic interpretation. We feel dimensional graphics should become a staple for the evaluation of company performance, assuming important roles as components of scorecards and dashboards. We also think that the advanced statistical graphics capabilities of the S-derivative languages, S-Plus and R will increasingly find applications in business. We anticipate continued adoption of R in the business intelligence world as part of its migration from the scientific and statistical communities. Next month the OpenBI Forum will investigate the R phenomena in more detail.