Performance Measurement + Program Evaluation = Performance Evaluation

This month's Open Forum is yet another column on performance management (PM). PM is certainly ubiquitous these days, what with corporate PM, business PM, process PM, operational PM, organizational PM, customer PM, HR PM, call center PM and other varieties that I probably haven't seen. I think we need a "scorecard" just to track the different dialects. I feel a bit deprived by not having participated to date in the PM boon, so let me add to the excitement - tongue firmly planted in cheek - by including another moniker, "performance evaluation," to the mix.

I've been interested in PM, especially the work of the balanced scorecard authors, for quite some time and was intrigued by Gary Cokins' excellent DM Review column from April 6: "Performance Management: Making it Work." In that column, Cokins investigates the relationship between performance management and business intelligence (BI), attempting to discern the boundaries of each. He concludes, correctly I believe, that PM leverages the information transformed by BI - that PM deploys the power of BI technologies. Cokins notes that "Together, BI and PM form the bridge that connects data to decisions." These sound thoughts piqued my interest to understand the transitions from BI to PM (and vice versa) even more thoroughly.

In a sense, strategy is like scientific theory, ultimately specifying hypotheses about cause and effect of behavior, generally cast as "if a then b," or "the higher a, the better b." The collective cause and effect linkages that comprise an enterprise strategy are maintained in the repository of a strategy map, and are submitted to testing and confirmation. An illustration of a multipart strategy from the balanced scorecard cast as a series of linked hypotheses is as follows: greater investment in employee training leads to improved service quality, which increases customer satisfaction, which in turn fosters greater customer loyalty that ultimately strengthens revenues and margins. For this example, service quality and customer satisfaction are lead or driving indicators while revenues and margins are outcome or lag measures. The strategy chain links behaviors to change in lead indicators, which in turn "cause" movement in outcomes. The complete relationship map is clearly articulated and subject to confirmation. An important charter of intelligence, which can be called confirmatory BI, is to support this testing and interpretation of proposed strategic linkages.

It is often the case that an entirely different perspective on a problem can provide fresh insight that might help crystallize thinking. It's sometimes the case that the new perspective comes quite coincidentally. In early April, I received a brochure in the mail promoting a book titled, Practical Portfolio Performance Measurement and Attribution. An amateur investment scientist, I purchased and read the book, much to my delight. A few weeks later, my wife asked me to dispatch some old books and notes that had been cluttering the attic. One of the books targeted was Thinking About Program Evaluation, by one of my college professors. After browsing and deciding the book was still relevant after the years, I dedicated an afternoon to rereading and was glad I did. Almost by happenstance, I had come across two sources that I found relevant to clarifying my thinking about performance management and BI.

In the investment world, performance measurement is the assessment of portfolio performance, answering the questions of "what" on the return on assets, "why" the portfolio has performed as it has, and "how" performance can be improved. Specific steps in performance measurement include the precise calculation of portfolio returns, the comparison of said returns against a relevant benchmark, the attribution of the sources of return, risk/reward analysis and feedback of results. This methodology is well articulated and generally accepted within the investment community.

Program evaluation (PE) is a discipline developed in the social and quantitative sciences - economics, psychology, education, sociology, political science, geography, statistics, etc. - to provide rigorous assessment of the impact of planned social, economic and behavioral interventions. Highlighting an increased focus on program evaluation, an interesting article in the June 19, 2006, Forbes magazine discusses a new breed of economists from Harvard, MIT, Columbia, Berkeley, et. al. whose research rigorously tests the results of anti-poverty and other economic programs, generally by randomized field experiments. Indeed, experimental public interventions are pervasive today. Income maintenance programs, charter schools, literacy programs, alcohol abuse treatment, new tax codes, gun control initiatives, safe sex programs, crime deterrence programs, et. al. are examples of interventions that are evaluated to determine if proposed benefits actually exist - and if those benefits are worth the cost of the programs. A major focus of program evaluation is to test hypotheses of the form "if a then b," or "the higher a, the better b," relating programs to desired outcomes. Accordingly, PE obsesses with intervention designs that will allow the program to infer that "a caused b" - for example, that the specific literacy program caused an increase in reading and language skills that enabled participants to be more attractive to the workforce.

What do strategy as theory, portfolio performance measurement, and program evaluation have in common? Simply put, each supports the rigorous evaluation of performance, looking to link cause and effect, with focuses that are very similar. A first step for each is to quantify what is to be measured, as derived from theory or strategy, with precise criteria and scaling. Weight change at six and 24 months is an obvious example for a weight-loss program, as is crime rate change after the introduction of a community program, the test performance of a charter school, or an increase in customer loyalty scores after an employee training program. Next is a design that can support the validity of an assertion that "a caused b" - that the intervention was responsible for the desired outcome. Design almost always mandates comparisons: treatment group versus control group, treatment group before and after an intervention, a group selected on a criterion versus the nonselected group, etc. A benchmark portfolio, for example, acts as a control group allowing comparison between an equities manager and an index. It may be the case that a particular manager has performed well in absolute terms, but has underperformed a passive benchmark index, the proverbial monkey throwing darts. That manager should certainly not be heralded as the next Warren Buffett. An interrupted time series design allows comparison of "a" and "b" in sequence, with time providing its own control. Over-time measures of customer churn before and after a loyalty program is an example of such a design. Marketing campaigns are often the gold standard of design, enabling rigorous statistical inference through randomized assignment to panels. Randomized email promotions and sales offers can be an inexpensive yet very effective way of discovering the market. Analysis accumulates measures, consolidates findings and interprets/presents the results of quantify and design in light of strategy and other considerations of importance, providing context and explanation for what was found. Finally, feedback closes the loop back to strategy, portfolio managers, or programs, linking intelligence to the management of performance and providing a basis for control and change.

At the risk of further cluttering an already crowded space, I propose performance measurement + program evaluation = performance evaluation as the discipline that links programs or strategy to intelligence, by quantifying, designing and analyzing cause and effect linkages and providing a foundation of feedback to manage performance. I can then join the legions of contributors to the august body of PM knowledge.