The Forum's initial charter was really one without boundaries - a blank slate for chronicling the BI world. We promised to discuss BI in light of business, information technology, quantitative methods, social science, philosophy and even portfolio management. The Forum wished to provide something for both techs and quants as well as for business and management, with both theory and practical applications. For our work in 2006, I would give us a grade of incomplete - a nice start but an unresolved finish.
The Forum has written on open source BI (not very surprising, given our company focus!), the R statistical package and several applications of R's graphics capabilities. We wrote a tongue-in-cheek column on performance management (Gary Cokins needn't fret.) and paid homage to exploratory data analysis (EDA), the precursor of much of the best of BI. And we ended the year with the first two installments in a three-part interview series with Gary King of Harvard on BI and the quantitative social sciences.
A common theme of each column is that OpenBI Forum is not on a traditional business intelligence (BI) thought leadership track. Having been a part of the BI community for more years than I care to acknowledge, I'm a big fan of many of BI's thought leaders, almost incredulous at how they can continue to deliver such consistent high quality - column after column, book after book, year after year. The Forum feels, however, that with the explosive growth of BI has come a bit of stagnancy to much of the "literature" that proliferates in the many BI channels, creating an "echo chamber" effect of increasing sameness. At the same time, the growing importance of BI for business performance has never been more uniformly confirmed. The OpenBI Forum thinks there's untapped potential from many of the "cousin" disciplines of BI, especially the academic worlds of open source, statistics, business, quantitative social science, computer science, management science, decision science, information science, etc. that can progress BI even further and quicker than today. Our goal in 2007 is to help expand the reach of BI to this outside world, finding innovative ideas that spawn from a cross-discipline perspective.
The wisdom that Gary King offers BI is, the Forum believes, a great example of such outreach. A major focus of Gary's work is on methodologies, particularly quantitative ones, for social science research. These techniques help to ensure the validity of study designs so that researchers can be confident in their assertions and findings. Using methods developed by Gary and his colleagues makes it easier to prove or disprove hypotheses and theories about human behavior. Techniques such as randomized field experiments, the systematic treatment of missing data, statistical handling of ecological inference problems arising from group data, the statistical treatment of rare events and knowledgeable survey designs help enhance the quality and validity of research.
The explosion in the use of quantitative techniques for decision-making noted by Gary is, of course, equally beneficial for business, in much the same way. After all, business is a social endeavor, and most of BI focuses on predicting and understanding behavior. Certainly, the validity of intelligence findings is just as pertinent for BI as it is for academic study. How often have we ignored the problems of missing information in our data marts, simply making the dangerous assumption that missing data cases look like the non-missing? How often have we constructed customer surveys that may, in fact, have been very flawed, providing us spurious information? How often have we missed opportunities to conduct randomized experiments, the platinum standard of designs? In the end, BI analysts are searching for the same thing as their academic cousins: confidence in the validity of their findings predicting human behavior.
The OpenBI Forum closes out 2006 the same way with started - with a few graphs. The Forum looks forward to getting started again in early 2007!
Time as an Ecological Group
In our November interview, Gary King discussed the ecological inference problem and the difficulties it can bring to analysis. An ecological fallacy occurs when a relationship between variables at a group level is imputed to the individuals within the groups - erroneously. Gary gives the example of using ZIP code sales information as the foundation for a marketing campaign. Given an association of lucrative sales in high income ZIP codes, a marketing organization might conclude that residents with higher incomes were the buyers and should be targeted, when in fact it is the lower income families within the ZIP code that are the prime buying candidates. Since the marketers have buying data at only the ZIP level, they cannot with certainty infer behavior to individuals, despite the correlation of strong sales with high income ZIP codes. The challenge for marketers in situations like this is analyzing information at a group level with statistical methods that can sidestep some of these problems and produce the best individual targets for their scarce campaign dollars.
Stock market investors may well be guilty of similar mistaken thinking in their buy/sell behavior, with time as the key grouping variable. With information on the performance of portfolios at five and 10-year periods, investors sometimes mistakenly conclude that all sub-periods behave similarly and project this into the future, when in fact the performance can look quite different. The time periods surrounding the Internet bubble burst of March 2000 offer a provocative illustration. Consider first the graph in Figure 1, produced with the R lattice package. The panels in the graph depict the growth of $1 invested from July 22,1993 through December 12, 2006 with different company size and value portfolios. The graphs are ordered left to right by company size; within each panel are color-coded "growth," "neutral" and "value" portfolio performance sketches. Note the bubble in the center of each, especially the larger growth portfolios. Note also how growth companies dominated performance in the first half of each panel, then gave way to value in the second half. Similarly, it is apparent that larger companies had more success early in the time period, while smaller companies had better returns later on. With this sequence of graphs, however, the magnitude of difference in performance over time is not readily apparent.