Continue in 2 seconds

Two Sides to Data Decay

Published
  • July 01 2003, 1:00am EDT
More in

Approximately 10 years ago, I lived in an apartment just outside of Boston. At the time, the ZIP code for the address of the building in which I lived was 02146. A few years after I moved, the U.S. Postal Service decided to split the area covered by that ZIP code into two parts. The southern section kept the 02146 code, while the northern area – where my former apartment building is – was assigned a new ZIP code: 02446.

This is, perhaps, a simple little story, yet it exposes two issues related to "data decay." For the lack of an official industry definition, data decay refers to an increasing degradation of managed information over time due to lack of proper attention. The first issue is the more commonly understood problem: Presuming that my name and address were in some customer database, as soon as I moved, that information was no longer correct. I have read in the past that the U.S. Postal Service estimates that 20 to 25 percent of the population moves each year, and this alone can account for a large amount of data disintegration.

Of course, there are processes in place to address this problem (no pun intended) through the NCOA (national change of address) licensees who provide data correction services. In general, data enhancement companies will attempt to resolve name and address corrections over time, with moderate success, although this process frequently introduces new inconsistencies. For example, I occasionally get letters addressed to me at my new home address that contain references to my former position at a previous job, even though I left that job almost four years ago, I no longer have that position and there are no offices of my former employer anywhere near my home.

The second issue that I referred to is much more interesting, and that is the question of what we could call "historical consistency." The problem centers on some event external to the data management system that creates an artificial inconsistency where it should not exist. Let's presume that the bank with which I did business decided to update the customer address records to reflect the change in ZIP code. I would guess that my customer address would be updated, which would be fine if I still lived there. However, I never lived in the 02446 ZIP code; therefore, making that change results in a data record that is historically inconsistent with reality.

Changing ZIP codes and new telephone number area codes are good examples of events that can lead to this kind of data decay, and one might wonder why this is a big deal. The crux of the matter relates to more complicated events, such as corporate mergers, changes in regulations or policies, or other changes that affect operational perception of the entities being modeled. The problem manifests itself when analyzing historical data for any reason (e.g., looking at sales trends over particular geographic areas, or evaluating customer loyalty as it relates to longevity). Let's look at an example:

  • In 1985, Joe opens a bank account with Manufacturers Hanover.
  • In 1991, Manufacturers Hanover is purchased by Chemical Bank.
  • In 1996, Chemical Bank merges with Chase Manhattan Bank.

In 2003, we ask: How long has Joe been a customer of Chase Manhattan Bank? If we look at the continuous relationship with the same organization, one could say that Joe has been a customer for 18 years. If we look at the relationship with the business entity called Chase Manhattan, one would say that Joe has been a customer for seven years.
Here is another example. A point-of-sale application for a large department store chain logs the ZIP code of each customer at purchase time. The company's management decides to analyze sales trends geographically over time. How do we evaluate the trends with respect to the area assigned the 02146 ZIP code? If we have modified historical data to reflect current code assignments, we will get an accurate view of the trends, although our data will be historically inconsistent. Alternatively, if the records have not been modified, the trending will be skewed by the fact that today the 02146 area is roughly half the size it was 10 years ago.

This second form of data decay is more insidious because the decay is a mirage. The truth of the information hasn't changed in its historical sense, but our perception of correctness is dependent on what we want to get out of the data. This implies that we basically need to have some alternate form of representing historical changes that augments historical data and can be applied selectively depending on the application.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access