Continue in 2 seconds

How to Interpret Modeling Results: Basics for the End User

  • June 01 2001, 1:00am EDT
More in

The goal of response modeling is to either increase response to a given solicitation quantity or reduce the solicitation quantity without suffering a drop in response. Otherwise stated, you can drive the numerator (response) or shrink the denominator (expense). In truth, after your analytical consultants have worked their magic and left the stage, you won't know for sure whether you've achieved either of these goals until you conduct "live" testing. Never- theless, important signals about your model's predictive ability can be found in the documentation that was left behind by the model builder.

Reading the Model Documentation

The output in Figure 1 is for a logistic regression model (LRM). We use a SAS example only because it is a frequently used statistical software, and LRM is one of the most common response modeling techniques. Other packages and algorithms produce similar tables.

One of the first things to note is whether all of the model's explanatory variables are statistically significant, as revealed by small values (less than .05) in the column headed "Pr > Chi-Square." The two rightmost columns, Stan-dardized Estimate and Odds Ratio, indicate the relative importance of each variable in contributing to the overall response prediction. The larger the standardized estimate, the more important the variable is to the model. The odds ratio, or exponent of the logistic regression coefficient, tells you how much more likely customers who have the variable characteristic are to respond compared to customers who don't have that characteristic ­ the higher the number, the greater the likelihood. What is important is that the direction of coefficients makes practical sense. In Figure 1, you can see that customers who work in small offices or home offices (SOHOFLAG) are more than seven times as likely to respond to the flyer mailing as customers who work in other settings. Similarly, the odds of responding diminish with increasing distance from the closest store (DISTANCE).

Figure 1: Output for a Logistic Regression Model

If your model was built with a stepwise procedure, each successive variable will account for a decreasing proportion of predictive power. In this case, it's often a good sign when several explanatory variables, not just one or two, contribute importantly to the overall model. You don't want an outcome where a small minority of customers has unusually high scores while the remainder score identically at the very bottom. This can sometimes happen when predictors are dichotomous (coded 1 or 0) and the model is completely dominated by only one variable. You can't get good discrimination when a large majority of customers is clumped together with the same score.

The final part of the table, labeled "Association of Predicted Probabilities and Observed Responses," presents four statistics indicating the model's overall predictive ability. These correlations are calculated from paired comparisons of predicted and actual response. The larger the number, the greater the predictive power of the model. It can be seen in this example that the prediction matched actual response for over 89 percent of customers.

Understanding the Gains Chart

Modelers like to show the power of their models through gains charts. Because this table provides strong clues as to how the model will perform at varying file depths, you should insist that it be included in the deliverables package. Figure 2 shows that by using the model to select customers, you can deliver over 85 percent of total responders with only 50 percent of the total solicitations, which translates to a 71 percent response "lift" versus a random, unmodeled selection of customers. Another way to look at this is that in an unmodeled campaign, 85 percent of the solicitations would have brought in 85 percent of the responders. Using the model, you would be able to gather the same number of responders by soliciting only 50 percent, a reduction of 41 percent in expense.

Figure 2: Response Model Gains Chart

Modelers will sometimes portray gains tables as "lift curves," as shown in Figure 3. These graphics are ideal for executive presentations because they are visually appealing and readily interpretable by senior managers. The telling point about a model's lift curve is how bowed this line is. The more arched the curve, the more powerful the model in "cherry picking" the most responsive customers. You can sometimes tell a good model by how far down the file it carries a large response lift before eventually converging with an unmodeled selection. Generally, you'll want to see a response bow that is well arched in the middle of the file. If the bow rises very steeply at an extremely shallow file depth, quickly flattens out and converges with a random selection, there is a chance your model may lack longer- term stability. This is sometimes the case with exotic neural net models that are prone to overfitting. The point is that if a model looks too good to be true, it probably is.

Figure 3: Response Model Power Curve

Examining Component Values

Models vary in terms of understandability. With some models, the relationships between the predictors and the outcome measure seem crystal- clear. At the other extreme are black-box models that may not make intuitive sense, despite having good predictive ability. Because the degree of understandability will be determined by your business needs and objectives, this should be explicitly communicated to the modeler at the outset of an engagement.

If you have opted for a fairly understandable model, you might want to perform a sanity check on the variables that made the final cut in the build process. Do these explanatory variables really "explain" why they might predict response? Do they make intuitive and theoretical sense? Or, are they so implausible that it's going to be hard to explain them to yourself, let alone senior management? If the latter is the case, it's possible that your model will lack longer-term reliability because some of the relationships could be anomalies peculiar to the modeling data.

Another consideration is whether your model truly bristles with predictor variables. There is certainly no general rule about the appropriate number of independent variables. However, you shouldn't assume that more variables make a better model. Indeed, some consultants caution that a model with many dozens of predictors, even though all of them might be statistically significant (due to a very large sample size), may not be as stable as a more succinct model when used for real-world mailings over an extended period of time. Such overfit models fit the sample data so tightly that some minor variables contributing only tiny increments to overall predictive power may not hold up after the database undergoes multiple updates. Overfitment, however, is not an issue for models based on ZIP code demographics and other variables that are not subject to change through an update process.

A Note On Operationalization

If you plan to perform customer scoring during scheduled updates, you should make certain that enough documentation has been provided to fully operationalize your model in the production environment. Ideally, your information technology (IT) department should code all of the variables exactly as they were prepared during the model build process. Additionally, since some mainframe languages lack a wide range of canned mathematical functions, calculation of some transformed variables may challenge the programming skills of IT staff. Obviously, it is far better for your modeler's deliverables package to provide detailed instructions for implementation than to encounter potential delays after project completion.

Insist on Model Validation

Arguably, the most important quality check of a model is to ensure that it was validated against a holdout sample. Multivariate models sometimes perform better in an analytical setting than they do in the real world, simply because they seek to optimize the relationships found in the sample data. Before constructing the model, the statistician should have randomly selected a portion of customers from the sample file and held them out as a validation group. After model estimation, the holdout population should be scored with the new algorithm. Then the performance of both groups should be benchmarked against an unmodeled selection, with results plotted in the lift chart, as shown in Figure 3. The more closely the curves for the build sample and validation sample align, the more reliable your model will be in selecting actual customers for solicitation.

Modeling projects often seem to be cloaked in an inscrutable mystique, making it difficult for non-statisticians to differentiate a good model from a not-so-good model. However, by following the steps described in this article, you'll be able to break through the statistical jargon, interpret the results and determine the value of your modeling engagement.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access