"Lift" is probably the most commonly used metric to measure the performance of targeting models in marketing applications. This article is a short lesson on what lift is, why it is important and some pitfalls to avoid.
The purpose of a simple targeting model is to identify a subgroup (target) from a larger population. The target members selected are those likely to respond positively to a marketing offer. A model is doing a good job if the response within the target is much better than average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response.
Lift is usually quantified by dividing the population into deciles ten even groups into which population members are placed, based on their predicted probability of response. The highest responders are put into decile 1, etc. Figure 1 shows a typical model applied against a population with an average response of 5 percent.
Figure 1: Response Rate and Lift Calculations
One thousand total offers were made, so each decile contains one hundred members. In the top decile, there were sixteen responders for a response rate of 16 percent. Compared to the average response rate of 5 percent, this gives a lift of 3.20 for decile 1. Each successive decile has a lower response rate. The deciles start performing worse than average after decile 4.
For each person targeted there is a cost of making the offer, and a corresponding profit if a positive response is obtained. The marketer can calculate the profit of targeting each decile and simply include each decile down to the last decile that is profitable. The cumulative response rate and lift will then show the average performance of the model for everybody in the target.
This information is often presented graphically:
Figure 2: Lift Chart
Another useful chart compares the cumulative percent of responses captured as each decile is added to the target. In the current example, the top two deciles capture about 55% of the responders. This is compared to a random baseline where two deciles (20% of the population) would capture 20% of the responders. This result is not quite the "80/20" rule, but it is much better than not targeting. The greater the area between the two lines, the more the model was able to concentrate responders in the top deciles.
Figure 3: Cumulative Percent of Responses Captured
It’s hard to say what level of lift represents "good" model performance, because the potential for predictive targeting varies widely between applications. A model predicting who will buy an American-made car could be highly predictive (potentially a lift of 4+ in the top two deciles), because purchase patterns tend to be stable over time. A model predicting who will buy a white car may have a top lift of around 1.5 if color choice is more variable and random. Each of these may be "good" models in the sense that they do as good a job of predicting as is possible. However, the model with the higher lift will probably be much more useful in marketing applications.
Rather than using lift to evaluate a model in isolation, it is often more useful to use lift to evaluate the relative performance of alternative models. If a tree model provides higher lift than a neural net on the same data, this provides a key factor in choosing between the models.
I will mention two things to watch out for when reviewing lift metrics. First, model lift should be calculated on a holdout sample that was not used to estimate the model. "Overfitting" can cause the model to predict well for the data set on which the model was estimated. But this performance needs to carry over into new data sets.
Second, when calculating lift, always ask the question: Lift versus what? Model performance should be compared against the marketing strategy that would occur if the model were not available. The model should not be compared against an unreasonably pessimistic alternative. Consider the following results of a direct mail program to sell magazine subscriptions.
Figure 4: Magazine Campaign Performance
As would be expected, the current subscribers had a much higher response rate for magazine renewals than did prospects for new subscriptions. In the absence of modeling, current subscribers and prospects would be treated as two separate groups. Targeting models, if used, should be built separately for each group and evaluated only within that group.
Suppose, however, that the two groups were combined and treated as one overall group for modeling purposes. The simplest model would use only one predictive variable whether or not the individual was a current subscriber or a prospect. Clearly, the model would put the 1,000 current subscribers in the top decile, and all the prospects in the bottom nine deciles. A naïve calculation of lift for the top decile would divide the current subscriber response rate of 50 percent into the total response rate of 9.5 percent for a lift of 5.26. But the model is simply taking credit for something that was obvious anyway. As usual, if it sounds too good to be true, it probably isn’t.