Over the years, we have all seen data rationalization and data quality improvement projects either fail outright or take significantly longer than planned. Similarly, I have seen many projects designed to integrate customer and vendor databases across functions or divisions, implementations of master data management (MDM)/customer data integration (CDI) solutions and even simple efforts to clean and maintain a single division’s customer data frequently fail to deliver the data required by end users because of data quality problems. Perhaps the most dramatic failures I have witnessed were initiatives to identify the company’s 25 largest customers for the CEO. (I have watched this action film three times with different companies in the lead role. They were spectacles because of the high visibility of the project.)

There are lots of reasons why these projects take so long to deliver results, but there is a common thread in all the examples I have witnessed. The underlying problem has been attacking the problem from the wrong end of the transformation chain. What do I mean by transformation chain? All systems that transform raw data into information that can be used for analysis or insight go through a set of common steps. (In this column, I’ll use information about customers and vendors, a.k.a. business entities, for illustrative purposes.) The steps are:

  • Sourcing. Raw data is entered into a database, initiating the transformation process. The raw data elements can be generated by a process (e.g., counting the frequency of some event linked to customers or vendors), entered manually by employees or captured as customers or vendors interact with your systems.
  • Matching. Once entered, the information should be matched against existing data because new entities (customers, prospects, vendors, etc.) should be processed differently than updates to existing entities. There are a wide range of software solutions for the matching step; most attempt to emulate how a knowledgeable human would recognize similarities between entities. But, note that successful matching requires new data that has been sourced using consistent and documented processes, high-quality current and historical data to match against and powerful matching software. In other words, the success of this step will be dependent on the quality of the sourcing process.
  • Identifying. After matching, each new and unique entity should be assigned a permanent ID. IDs that embed information (e.g., the first three digits of the ID carry information about location) generally require significant effort to maintain - in most cases, more effort than the resulting benefit because the same data can simply be a field in the database. At this stage, all known names (legal name, doing-business-as names, nicknames, etc.) and addresses should be tied to the single ID. One of the most serious problems that any entity information system can suffer from is multiple instances of the same entity, each with a separate ID. In other words, the key to successful identification is the matching process.
  • Linking. Once an entity has been uniquely identified, your business end users will need a clear picture of the relationships between entities. For example, how does a unique ID link tax IDs? What are the parent, child and sibling relationships between entities? What are all the transaction types that a given customer has with your company? This is almost always the most difficult step in the transformation process, and it is completely dependent on the quality of your sourcing, matching and identifying links in the transformation chain.

Because of the dependencies in the process of transforming raw data into business-ready information, all efforts to improve data quality and develop CDI/MDM solutions need to begin at sourcing and then move through matching, identifying and linking. The problems we have all witnessed develop because business end users do not understand the transformational chain and therefore can rarely articulate where they think it is breaking down.
Solving end-user data quality problems, however, will require that you start at the beginning of the chain and work through to the end.

  • Sourcing. Strong data governance to ensure timely, accurate, complete and consistent data is essential to the success of this first step in the chain.
  • Matching. If you have high-quality data entering your entity information system and high-quality, current and historical data (note the rate of change mentioned above) already stored in the system, you will be two-thirds of the way completed on this link in the chain. Many software packages do an excellent job of matching when the right data is available.
  • Identifying. If internal identifiers carry no information, this step in the transformation process is relatively straightforward.
  • Linking. This is the hardest link in the transformation chain. It requires significant time and effort to tease out relationships, but the effort you put into managing sourcing, matching and identifying will determine the success of relationship creation.
The key takeaway is that data quality enhancement projects and data integration projects will achieve success faster if you start remediation at the beginning of the transformation chain. Next month, I will provide a case study to illustrate the point. I’ll use the most terrifying data request of all: “The CEO wants to know our 25 largest customers.”

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access