We're not on the same page.

That's Greek to me.

We're comparing apples and oranges.

These phrases are familiar refrains to IT professionals working with business customers. We must find ways to bridge the communication gap, especially in the data management and business intelligence space where we aim to unlock the value that is often hidden or neglected in business data.

An abundance of tools aid businesses in collecting, organizing, managing and gaining competitive advantages from their data. However, many businesses have foundational data work that must be tackled before any technology solution can add value.

Perhaps because this work is not considered to be as flashy as the latest technology solution, it is too often neglected or seen as documentation that can be put off for later. Even when businesses agree to tackle data definitions up front, or top down via a data governance initiative, difficulties around doing this foundational work are common. Malcolm Chisholm points this out in his article "Real Definitions versus Nominal Definitions in Data Management":

"We frequently speak of definitions in data management, but they are often taken for granted. In particular, it seems that everyone knows what a definition is and that every one assumes producing definitions is easy. Nobody stops to ask what exactly a definition is and if there are any particular considerations about formulating definitions."

Consider the scenario of a business with data stored in a number of legacy system silos. Since the data is isolated within the silos and used to support independent business units, it is shared sparingly, if at all. Additionally, business users are only familiar with their particular legacy systems. In this case, definitions of data elements are a crucial first step to integrate data across the legacy systems into a consolidated data warehouse. Without data element definitions, the data warehouse could be populated with data that makes sense to one group of users but not to all users who need the data. Good data warehouse design must start with a solid foundation of data element definitions.

Definitions will also be a crucial aid as data is mapped from the legacy sources to the target data warehouse. Without integrated data and commonly understood definitions, the wrong data from the wrong legacy databases could be brought into the data warehouse. Additionally, the business users may assume they will see the same data they have always expected ("These customer IDs make no sense - they're supposed to be in this format").

Testing of the data warehouse solution also hinges on quality data element definitions. Without definitions as a standard, it will not be possible to write unambiguous test scripts. Business customers may find unpleasant surprises during user acceptance tests if they did not agree on definitions of data at the beginning of the project.

To avoid these pitfalls, requirements for populating a data warehouse must include business definitions of each data element going into the data warehouse. These definitions provide a common understanding of what data elements are required, what the data elements mean and the correct source of each data element.

Given the need for good data element definitions, what is the best way to go about getting them? You will need to engage the business and IT experts who are familiar with the legacy systems where the data is stored. After all, these people use the systems daily and are the most familiar with the data they need, how they use it and how it is represented in their system. This sounds like it will be a breeze, right? Simply get the business users to define each data element and explain how they use it in their daily operations. However, it may not be as easy as it seems.

Business users are often not accustomed to providing definitions at the level of detail required in order to ensure the right data gets into the data warehouse. Additionally, their familiarity with the systems and the data elements may actually make it more challenging for them to explain data elements to an outsider who has not used the systems as extensively as they have.

IT database administrators understand the physical structure of the legacy databases and the relationships between tables, but they may not be as familiar with how the business uses the data. Additionally, IT definitions are not always meaningful to business users from other departments or familiar with other systems.

Getting a definition from either group in isolation often ends up in confusion, as each group by default will offer limited explanations that make sense to them but not to anyone else in the organization. If business or IT users insist that their definition is good and everyone knows what they mean when in fact that is not the case, the following strategies may help.

1. Provide examples of unclear versus clear definitions. Users who are intimately familiar with their business process and supporting systems may not understand the point of specifying exactly what they need. To them "the ID of the customer" is a perfectly acceptable definition of "Customer ID." Or, the IT representative may give a definition that works for him or her but no one else, such as "the primary key of the customer table." It will help both to see examples of what is needed in order to have a workable definition to support data warehouse population and use of the data. Beyond being ambiguous, the "unclear" definition in the example (see box on the next page) hides the fact that the Customer ID contains embedded data that might otherwise have been overlooked by data modelers.

2. Understand what the business user wants to do with the data. If you are still stuck with an incomplete definition, ask the primary business users to explain why they need that particular data element and how they use it in their work. Perhaps the data element is not being used, in which case it may be best not to bring it in to the data warehouse. If it is being used, understanding how the business uses it often will allow you to refine their definition into something more meaningful and workable.

For example, if your business definition of order type is "what kind of order it is," asking how it is used could yield valuable add-on information: "We use order type to track sales results for our monthly sales reports. John is responsible for software orders, Phil handles hardware orders and Elizabeth owns service orders." Armed with that knowledge of use, you can expand the definition of order type: "Order type: the sales category of the order. Must be one of the following: software, hardware, service. Used to group sales by order type for monthly sales reports."

[Click here to see examples of clear and unclear definitions.]

3. Bring all stakeholders together to review definitions. If you are not fully familiar with the business data you are documenting, you may not be able to catch all vague or poorly articulated data element definitions. And if you take the business users' word that their definitions are clear, you may end up with trouble down the road when you use the definitions to populate the data warehouse and provide the data for business users in different areas of the organization.

The solution here is to have all stakeholders confirm definitions once you draft them with the primary business users. With input from all business and IT users who use the data, you can ensure that the definitions provide enough detail and clarity that all stakeholders understand the meaning and use of each data element.

With the abundance of sophisticated technological tools at our disposal, it is easy to sell technology as a silver bullet that will bring organizations from the infant stages of data maturity to full-fledged adulthood. It is our job as IT professionals to emphasize the foundational work that must occur in order to harness the power of the technology that is available.

Data element definitions are a crucial part of this data foundation that every organization must build on in order to realize the benefits of a business intelligence solution. These definitions will also provide tactical support to the design, build and testing phases of the project. As an organization matures in managing its data, these definitions can be the starting point for putting together a high-level data dictionary. These definitions will also be a part of the foundation that will allow the business to achieve better processes around data governance and metadata management. This foundation will be the necessary solid ground to realize the business value inherent, but often unrealized, in business data.

The beginning of wisdom is a definition of terms. -Socrates

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access