Two key measures of model performance are reliability and validity. Without being overly technical, reliability has to do with the uncertainty around model results that is associated with inherent randomness in the data. Even though the probability of a flipped coin coming up "heads" is 50 percent, when a coin is flipped 100 times there will usually not be exactly 50 occurrences of heads. Similarly, we can never be sure that model parameter estimates are exactly correct, because the data with which the model is estimated has elements of randomness. However, the issue of model reliability is well understood and is usually addressed openly when model results are presented.

Model validity is a potential problem that is much more likely to be overlooked. As a general definition, a model is valid if it is measuring what it is intended to measure, and nothing else. This is critically important, because if a model is valid then it can be applied to real- world problems with predictable results. Invalid models may produce unintended results – with consequences to match.

This will seem more meaningful if some common causes of model invalidity are discussed. One is the problem of "overfitting." Models are estimated, in part, by choosing models that produce a good "fit" in an estimation database. For a purchase probability model, the model is chosen so that within the data the prediction of purchase matches actual purchases as closely as possible. But if models are allowed to become very complex, this process can go too far. Overfitting occurs when important elements of the model reflect randomness in the data rather than underlying model drivers.

Figure 1: Overfitting

The data charted in Figure 1 came from a linear model as estimated by the straight regression line. I can improve the fit by allowing the model to be more complicated – the wiggly line is a model estimated with, in addition to a linear variable, variables to the 2nd, 3rd, 4th and 5th power. Clearly, within the estimation data set, the complicated model fits better. But it does not capture the true underlying drivers of the data. Rather, it is being influenced by the random occurrences of when some observations are above or below expectation in this particular data set. If the overfitted model is used for prediction on a new data set, it will perform worse than the "simple but true" straight line.

Another case of an invalid model is when the variable coefficients are "biased." This means that the estimated impact of each variable does not truly reflect that variable alone but a mixture of several factors.

Figure 2: Computer Purchase Model

Consider a purchase probability model for computers in which income and education both have a positive influence. This is depicted in Figure 2. Income has a positive impact on purchase, since each line slopes upward. Education has a positive influence on purchase, since the "college+" line is above the "high school or less" line.


Figure 3: Computer Purchase Model without Education Variable

But suppose your data set did not contain information on education. If you estimated a purchase model that just included income as an explanatory variable, it would look like Figure 3. The data points are in exactly the same places. But because high income tends to be associated with high education, the impact of income is overestimated. The estimated impact of income reflects a mixture of the true income effect and the true education effect.

These examples of invalid models refer to what is sometimes called "internal" validity – whether the model is valid within the context in which it was estimated. Another issue is "external" validity – whether the model can be applied to situations beyond the context in which it was estimated. While internal validity depends on the model itself, external validity depends on how it is applied.

The computer purchase model of Figure 2 appears to have internal validity. However, suppose it was estimated on a list of subscribers to a leading computer magazine. In this case, the model would likely remain valid if applied against the subscription list of a different computer magazine (although this would have to be tested), but would be very unlikely to be valid if applied against the subscription list of a gardening magazine. Similarly, since nobody in the estimation sample has income of over $100,000, it would be dangerous to use it on a data containing only millionaires.

Fortunately, it is usually possible to test for model validity. The most common method to test for overfitting is to apply the model against a "holdout" sample – a random sample from the estimation data set that was held aside during the estimation process. If overfitting is not present, then the performance of the model in the holdout sample should not be substantially worse than the performance of the model in the sample used for estimation. Similarly, when applying a model to a new data set, a test market program can be run on a sample of the data. If the performance of the model is adequate in the test, the program can then be rolled out to the entire market.

The key point is that reliability and validity are separate issues. Even when somebody presents you with a model that has great measures of reliability (a high r-squared or high t-statistics, for example), it doesn’t automatically mean the model will be useful. Other estimation and implementation issues must still be considered.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access