Counterfactuals, Difference-in-Differences, Propensities and Treatment Evaluation

Register now

I received an email on my recent “Econometrics vs. Statistics” blog from an old friend early last week. We've always had a feisty relationship, so I was a bit surprised when he noted basic agreement with what I'd written. He also reminded me of a book we both purchased and discussed a few years back, “Microeconometrics, Methods and Applications,” by Cameron and Trivedi, that's become the go-to reference for many techniques of the “new” econometrics.

The MMA authors seem to agree with much of the criticism of traditional macroecometrics, citing the strong aggregation assumptions that often underpin the methods. Microeconometrics, in contrast, is “quantitative analysis founded on microdata (that) may be regarded as more realistic than that based on aggregated data.” In other words, microeconometrics is more practical than macroeconometrics. With my BI hat on, I think I agree.

After we traded emails, my friend and I conversed about work. His current job is as a statistician in the evaluation office of a federal demonstration program that offers unemployed and underemployed workers job training. The program's hope is that skills gained prepare participants for more meaningful jobs that can appreciably raise their earnings. As a PE analyst, my friend's job is to use data and statistical methods to determine if the program has achieved that goal.

In his thinking, the job training program is an intervention with performance that can be measured. Those who participate in the training comprise the experimental group, while a sample of non-participants constitutes the control. The methodology he deploys is to compare the difference in earnings of training participants before and after the program to similar calculations for the controls. My friend's specific task is to incorporate the most suitable design and statistical techniques for assessing performance, and subsequently to disseminate results of investigations to program stakeholders.

One challenge he faces is that candidates are either assigned to the training program or not, but never assigned to both. For those in the training program, my friend can get a measure of earnings performance, called the factual, but cannot observe what would have happened had they instead not been chosen for the program. In highfalutin microeconometrics jargon, the “missing” performance of the training participants in the no-training control condition is known as the counterfactual. My friend wishes to contrast factual with counterfactual outcomes, when only one of the two is observed and the other is missing.

A way out of the conundrum is to assign candidates randomly to training. With random assignment, even though an individual would be observed in only one of training or control, my friend could be reasonably assured that the groups were statistically equal on outside factors out of the gate. Alas, random assignment was impractical for his program, and there's some evidence to suspect that treatment and control groups are “a priori” different on other factors. My friend must anticipate these factors and make statistical adjustments so valid comparisons of effects of training versus no-training can be made.

Without the ability to make “other things equal” by randomizing to treatment and control, he looks for a set of predictor variables X that can equalize the groups statistically. In microeconometrics speak, his assumption is that “participation in the treatment program does not depend on outcomes, after controlling for the variation in outcomes induced by differences in X”. In other words, statistically using the X variables can make the treatment and control groups “equal” on outside factors much like randomization.

Suppose my friend knows the values of the confounding X's for the training group and for a large sample of controls. In lieu of randomization, he can try to find an exact match of the X's for each training case with a corresponding control. If exact matching of all X's is impractical, an alternative is matching using “propensity scores” which consolidate the X's to a single measure that predicts the probability of receiving training. Once these scores are calculated, the matching process can be significantly simplified: choose one or more controls for each training participant such that the propensity scores are nearly the same.

After treatment and control samples are equated by equating propensity scores, my friend computes the mean effect for treatment by subtracting pre-program income from post-program income. The computation is similar for the mean control change. The difference-in-differences calculation is then just the former minus the latter – the control difference subtracted from the treatment difference. Finally, ordinary least squares regression models are used to see if the “purified” difference-in-differences are statistically significant.

According to my friend, treatment evaluation techniques like those outlined above are increasingly important for both the conduct of social experiments and for evidenced-based business. For those interested in further information on the topic, there's plenty of material, including a dedicated chapter, in MMA. Pretty good stuff once you navigate the semantic puffery. I only wish TE would have been deemed similarly important when I took grad courses in econometrics years ago!

For reprint and licensing requests for this article, click here.