Continue in 2 seconds

Measuring the Value of Mined Information

  • Kamran Parsaye, David Petrie
  • March 01 1998, 1:00am EST

Most people will not question the fact that we live in the information age and that many pieces of information have a concrete dollar value. Thus most people intuitively agree that information is in some sense directly related to money, as evidenced by the fact that there are many prosperous organizations whose sole business revolves around the selling of information (Meyer and Zack, 19961). Indeed, information and knowledge are now viewed as the intellectual capital underlying the new organizational wealth (Stewart, 19972 and Sveiby 19973).

Think of it this way: every corporation has a collective model of the business world based on the information it possesses, e.g., what influences profits and losses, where opportunities and pitfalls are, and what drives product quality and consumer demand. As the information becomes more accurate, the corporation's ability to compete increases. Hence corporate information is clearly an asset.

If information is an asset, then we should be able assign a dollar value to it. A natural question arises as to how we can measure the dollar value of a piece of information. Given a free market, a first answer would be that the real value of the information is whatever the market happens to pay for it. And, as we shall see later, this simple approach based on "perceived value" may at times be used to estimate the value of information, but much more is needed in most circumstances (e.g., when a piece of corporate information is not for sale on the open market, but is used for internal decision making).

Let us also note that information is distinct from data and has a much higher value. The process of extracting information from data is often called data mining. While a database may include the data about the account history of a corporation's customers, data mining provides information about the characteristics and trends that lead to customer retention and profitability--more valuable than the data itself. The transformation of the data into information significantly increases the value of the data.

Data mining can thus impact some of the basic economic metrics applied to a corporation. One such measure is "Tobin's q"--the ratio of the value of an asset to its replacement cost developed by Nobel prize winner James Tobin. As noted by Federal Reserve Chairman Alan Greenspan, high q and market-to-book ratios for a corporation can reflect the investments in technology (Stewart, 19972). Data mining can increase Tobin's q for a corporation by enhancing the value of technology investments.

The Perceptive Gap vs. the Dollar Gap

It is a truism that the world is getting more complex every day. Each of us has concepts and perceptions about the state of the world, but our ideas are often somewhat off the mark and certainly never keep up with the pace of change. Indeed, as noted in Van der Heijden4: "Human beings and organizations do not act in response to reality, but in response to an internally constructed version of reality." Hence the key task of a decision support system in a business setting is to deliver value by bringing the internally constructed version of reality closer to the real world."

Figure 1: The Perceptive Gap

As shown in Figure 1, we define the perceptive gap for an individual or organization with respect to a given topic as the distance between internally constructed concepts and reality.

This gap is easy to estimate in simple cases, but requires significant analysis in other instances. For instance, if a marketing manager thinks that the average age of profitable customers is 31 and the real number is 41, we have a gap of 10 years. A one-year perceptive gap may be tolerable, but a 20-year perceptive gap will have serious consequences for marketing campaigns. In another case, a marketing manager may think that the profitable customers are over 40 and have several children while, in fact, the profitable customers may be those between 28 and 33 with less than two children. Again, we can estimate such a gap by direct means. However, in more complex cases this is not so straightforward.

The next obvious question relates to the consequences of the perceptive gap, i.e., what do we achieve by reducing the gap? How much help does a company get if it knows its customers better? This is, of course, dependent on which individuals have the perceptive gap, and individuals with key responsibilities can (at times) make a bigger impact on a corporation. However, we propose the following axiom as self-evident: A corporate perceptive gap will inevitably result in a dollar gap in profits.

We call this the Perceptive Impact Axiom. What we do next is measure such dollar impacts in specific processes.

Estimating the Value of Information

Since the 1930s, the scientific measurement of the term "information" has been greatly influenced by the seminal work of Claude Shannon (Shannon and Weaver, 19495), which was originally motivated by the transmission of signals over communication channels such as telephone wires.

Over the years, the mathematical elegance of Shannon's work has led to significant applications in diverse areas from compact disk manufacturing to molecular biology (Schneider, 19966). Shannon's work does not deal with the meaning of information, but with probabilities. His focus was on the amount of information transmitted, rather than the dollar value of the information. However, significant amounts of other research exist on determining the value of information--for instance, see Ahituv, 19897 for a review. These approaches break into two groups: those based on "perception of value" and those based on "measurements."

In our approach, the value of information is measured by observing differences in the decision-maker's performance when provided with different types of information. The basic premise is that information affects performance. The performance of a decision- maker (or an organizational function) is measured prior to the introduction of new information and compared to the performance thereafter.

Thus with this method, we evaluate two performance levels based on two levels of information availability. The difference in performance levels (measured in dollars) is used as a surrogate for the value of the information obtained.

Models of Information Usage

In order to apply the realistic/operational value of information, we need to evaluate performance. To do so, we proceed as follows:

  • We build a model of a process which uses information to generate revenue.
  • We work out two scenarios: one with a different information content based on the use of a data mining technique, the other without it.
  • We measure the revenue difference in the two scenarios as the dollar value of the information.

This is shown in Figure 2.
Note that we rely on information usage and not data usage. While data refers to the values of attributes some objects have (e.g., the profession and age of John Smith, the price of a product, etc.), information refers to the distributions of the values among clusters of objects (e.g., athletes buy more soft drinks than bankers, people buy paints and paint brushes together, teachers respond better to certain direct mail campaigns than others, etc.).

Figure 2

Customer Prospecting

As an example, we will focus on the value of an organization's customer base because of the recognized importance of relationship management as the basis of businesses. Suppose that an organization is planning a campaign to solicit new customers. The prospecting process begins with a segmented list of potential customers, which the organization will contact by mail, telephone, etc. Depending on the source of the prospect list, various attributes may be known about these individuals. The organization will probably conduct a test mailing of randomly selected names from the list, collect data about responses and product sales from the test, and then correlate this to the attributes of the prospect. See Figure 3.

Figure 3

An analysis of these responses is necessary to design the prospecting campaign. Using rather simple analysis techniques, responses are grouped into strata based on common attributes of customers in the group, such as age, income, education, etc. These attributes are then used for selecting the "best prospects" from the prospect list and for estimating the response rates and new customers from the campaign.

For example, a simple analysis of responses from the test mailing identified white-collar professionals in the 30-40 age bracket, with incomes of $70,000 or higher, as the highest response strata (2 percent response). Mailing scenario 1 would target these individuals for the prospecting campaign. Using this information, bottom-line business results could be estimated.

But what if we somehow could obtain additional information about the responses from the test mailing? Using data mining techniques, such as cluster and contribution analysis, it is possible to examine many more combinations of attributes in building the customer strata. Plus, bottom-line contribution, not just response rates, can be considered in defining "best prospects."

Armed with this information, the list of prospects and product offerings can be fine-tuned to deliver better results than otherwise would be realized from the campaign.

Returning to the example, we use cluster analysis to look at additional attributes of individuals responding to the test mailing and discover that several subgroups exist within the high response strata. We notice very high response (8 percent) from salespersons in the western United States, particularly the states of California, Oregon and Washington. But the response rate is considerably less for teachers (1 percent) and police/fire/military (.5 percent) professionals, in this 30-40 age bracket. Cluster analysis also uncovers the fact that sales professionals in the western states, except in the 45-55 age bracket, are high respondents (7 percent). This is interesting, since the 45-55 age group overall shows a low response rate (.5 percent).

We now shift our attention from response rates to bottom-line profits from the test mailing. Using the data mining technique of contribution analysis, we examine the profitability of products purchased by our current customer base. Certain products, such as checking accounts, contribute little profit. But money market accounts and certificates of deposit generate a healthy 5 percent, with credit cards delivering a 15 percent margin.

At this point, we combine our knowledge of test mailing responses and profit contributions to design a customer solicitation campaign that will generate substantially better results than those gained by simple response analysis. The 30-40 age group in our example responded well to the test mailing, but generally purchased the low- margin products. The highly paid salespersons in the 45-55 age group tended to use credit cards and purchase more profitable insurance products. Incidentally, the over-55 age group, overall, purchased the most profitable products, but did not respond well to our test mailing. With this information, we decide to aim our campaign toward sales professionals in the western United States, earning more than $65,000 and in the 45-55 age group.

The estimated conversion rate, sales volume and profit margin of the campaign using the information derived from data mining will be considerably better than the results expected from mailing scenario 1, which is based on relatively simple analysis techniques. Using the realistic method, the difference in bottom-line profits from the two mailings is the dollar gap--a measure of value of the information obtained.

Following a test mailing, we are able to estimate the costs and expected returns from a customer prospecting campaign. As shown in Figure 4, we estimate the fixed and per-prospect costs, as well as an annual sales volume and profit margin for new customers. From a simple analysis and stratification of customers from the test mailing, we estimate a conversion rate of prospects to customers (in this case, 2 percent). Our expected returns from a mailing of 250,000 prospects are calculated. (We use an assumed discount rate, in order to properly handle the time value of money.) Notice that we will break-even with 8,215 new customers and will recover the costs of the campaign in the second year. Beyond that, we expect additional profits.

Figure 4: Customer Prospecting

Now suppose that we have employed data mining techniques and have additional information about responses and profitability from the test mailing. We select 250,000 prospects for our campaign mailing, but it is a different set of names. And we offer these prospects a different set of products. We now have reason to expect better conversion rates, sales volume and profit margins. This generates a lower break-even point for the campaign, faster recovery of the campaign costs and improved net present values of our profits over one, three and five years. The dollar gap between these two campaign approaches is shown at the bottom of Figure 4. These amounts (more than $1 million in just the first year) are the value of the information from data mining.


  1. Meyer, M.H., Zack, M.H., "The Design and Development of Information Products." Sloan Management Review, Spring 1996.
  2. Stewart, T. A., "Intellectual Capital: The New Wealth of Organizations" New York, Doubleday, 1997.
  3. Sveiby, K. E., "The New Organizational Wealth," San Francisco: Berrett-Koehler Publishers, 1997.
  4. Van der Heijden, K., "Scenarios: The Art of Strategic Conversation," NY: Wiley 1996.
  5. Shannon, C. E., Weaver, W., "The Mathematical Theory of Communication." Urbana, IL: University of Illinois Press, 1949.
  6. Schneider, T.D., "New Approaches in Mathematical Biology: Information Theory and Molecular Machines," 1996.
  7. Ahituv, N., "Assessing the Value of Information: Problems and Approaches." Proceedings of the Tenth International Conference on Information Systems, December 1989. Boston, MA.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access