In my column last month, I described the data transformation chain. In summary, all systems that transform raw data about business entities into information that can be used for business insight must go through a set of common steps:

  • Sourcing: Collecting and entering raw data about the entity.
  • Matching: Determining which data is truly new and which is an update to an existing entity.
  • Identifying: Creating a permanent identifier that links all known names, addresses, etc. to one entity.
  • Linking: Establishing the relationship between entities, such as branches and subsidiaries.

The quality derived at each step in the transformation is heavily dependent on the quality of the preceding steps. Thus, solving data quality problems, particularly those found when consolidating across databases, requires that you start remediation at the top and work down the chain.
The example that I will use to illustrate the logic is a problem I have encountered at three different companies. In all three cases, the CEO asked what appeared to be a simple question: What 10, 25 or 50 companies are our largest customers? The last time I checked, smart and hardworking people at all three companies were still trying to answer the question. In all three companies, significant software expenditures were made for state-of-the-art customer data integration (CDI) and/or master data management (MDM) software, but the answer continues to elude the teams working on the question.

Why is it so difficult? In large measure, it is because business entities are very complex. There are both external and internal reasons for the complexity. The following are examples of factors that introduce complexity.

External factors include:

  • Companies have multiple, legitimate names, such as registered business name, doing business as names, nicknames and abbreviated names.
  • Subsidiaries and branches frequently have names that bear no physical resemblance to the parent company’s name.
  • Mergers and acquisitions can completely reshape companies’ names and addresses.

Internal factors include:

  • Different functions within a company may have unique name and address conventions (e.g., bill-to versus ship-to names and addresses).
  • Casual practices by colleagues may lead to abbreviations, addresses specific to the individual customers’ locations, etc.
  • Premerger/acquisition names may continue to be used.

Whenever the exercise to determine the 25 largest customers is undertaken, start with a focus on the sourcing, matching and identifying steps of the transformation chain because linking up related entities within corporate family trees is the hardest step. You don’t want to do this heavy lifting with data that is dirty when it arrives at the relationship process. The only way to solve the problem of determining the 25 largest customers is to:

  • Start with improved data governance at the sourcing step. The people in your organization who are closest to the customers (e.g., salespeople and order-to-cash teams) generally know a great deal about customers’ family trees. They generally don’t, however, have data collection systems or incentives to store the data systematically. If you take the time to create input opportunities and gain agreement on collection guidelines, you will obtain information that will be very valuable when you get to the linkage step.
  • Use state-of-the-art matching software. Many companies assume that some smart people in a room can look at the data and see the relationships. People will see relationships, but they may not be the right ones.
  • Create permanent IDs once you have uniquely identified each entity and have the governance structure in place to ensure that people don’t regularly create new ones rather than taking the time to find the entity. Monitor the creation of false new entities and establish consequences for compliance and noncompliance.

Let’s make this concrete with an example. Suppose you work for a company with multiple databases supporting sales, customer service, order to cash and fulfillment. Your company has information about another company, let’s call it ABCD, Inc. ABCD was created by the merger of AC Corp. and BD Inc. In the various databases from around the world, there might be business entities with names like AC Ltd., BD GmB, plus other business entities with names like AC Enterprises, BD Petroleum and other names that appear to be unrelated. Your sales teams around the globe may use shorthand names like A-D, etc. A salesperson may have a name and contact address for a subsidiary that was acquired last year, while the order-to-cash team has changed to the post-merger name.
The point is that the family trees of multinational corporations are complicated and are regularly changing. If you want to determine your largest customers and understand how they interact with your company, start at the top of the data transformation chain. Simply tackling linkages will provide your CEO with a seriously flawed answer.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access