I ran across an interesting paper on performance measurement the other day. The article touched on many methodology and design themes I've been writing about in this blog over the last couple of years.

Assessing organizational performance is a critical function of BI. This article, though, has to do with PM more narrowly defined in the investment science world: how to measure and attribute performance of stock/bond investment portfolios. What portion of a given portfolio return is simply due to the direction of the market that's available to anyone who's participating (beta), and what part is due to the special skills of portfolio managers to tweak additional returns for their investors (alpha)? Demonstrating a positive alpha is important for financial services firms that must justify the fees they charge for services.

To establish a measure of portfolio manager skill – alpha – PM conducts observational "experiments" that rely on comparing performance of target portfolios against “control group” benchmarks. Early on, that benchmark was an established index like the S&P 500. Though the S&P or similar measures might seem a reasonable basis of performance comparison, it generally comes up short in assuring “other things are equal” to allow for a valid quantification of alpha. What if the benchmark index differs from the target portfolio in systematic ways unrelated to the skill of the portfolio manager?

“The problem with this is the makeup of the S&P 500, which is often different from the universe of investments the manager actually considered.” For example, the S&P is a large company index, while the target portfolio might be comprised primarily of mid-cap stocks.

Over time, improvements were made to benchmarking techniques by selecting a random sample of securities to comprise the comparison portfolios. Yet further advances came by crudely matching securities in the benchmark with those in the target portfolio on pertinent characteristics such as company size, book-to-market value and geography. With this simple matching, “the final benchmark is a weighted  combination of these portfolios matching exposures to the original portfolio.”

But matching on just three factors of five levels per would require a partitioning of the security universe into 5**3=125 buckets. And while both of these approaches where certainly steps in the right direction, they still left considerable bias in the makeup of the generated benchmark portfolios. The critical control group was still lacking.

The article outlines a sophisticated methodology to redress these methodological shortcomings for determining alpha. Lamenting that they cannot conduct a randomized experiment to assure that factors in their benchmarks are equal on average to those in target portfolio, the authors do what they consider the next best thing – matching securities included in the benchmark to those in the target based on propensity scores calculated by models with important predictive characteristics. The propensity scores could in theory summarize ten or more of these factors simultaneously, an improvement over the limited factor by factor matching.

"Our method ... is more precise because it tackles the curse of dimensionality, allowing arbitrary numbers of characteristics ... The propensity score has an important balancing property: stocks with the same propensity score have the same characteristics on average.”  “The “other things equal” assured on average by randomization could thus in large part be mitigated by matching on propensity scores with this approach.

So the authors articulate a methodology for divining a little-biased benchmark/control group to asess the the performance of the target portfolios. How then do they estimate the all-important alpha from target and benchmark data? They generate repeated benchmark portfolios to match the target, holding by holding, based on proximity of propensity scores to non-holdings securities. Where there are multiple match candidates, they choose one at random. They then repeat the exercise until all target holdings are matched. Let the computer do the work.

The findings reported augur well for the matching by propensity score method proposed. The authors note that bias is lower with this approach than for the other methods discussed, so the characteristics of the benchmark portfolios closely match the target. The target portfolio's return of 4.5 percent over the holding period exceeds 85 percent of the generated benchmarks, suggestive but not conclusive of a positive alpha. The results in this case fall in a gray area. The manager might be worth his outsized compensation.

I took away some important lessons from this article. There's certainly much for BI to emulate from the performance measurement discipline of investment science. Distinguishing alpha from beta should be central to BI thinking. And the methods to help BI get there must obssess on control group/benchmark construction that can assure the least amount of bias to the estimates of alpha and beta.

Advanced statistical methods such as matching with scores generated from propensity models can help overcome the inherent weaknesses of non-randomized experiments. Finally, the use of computer simulation methods to generate a distribution of alpha estimates can add a lot to the understanding of performance for very little cost.