for Information Management Blogs
JUL 7, 2010 6:44am ET

Blogroll

The Flaw of the Data Inventory

Print
Reprints
Email

Back when I was applying to college, I’d read over college catalogs. Inevitably, each university would mention the number of books it had in its library. When I finally went to college, I realized that this metric was fairly meaningless. A dozen volumes on Grecian pottery did me no good when I was in search of a book on polymers for my mechanical engineering class.

Clients will often ask us to scope a “data inventory” project, inevitably focused on identifying and describing all the data elements contained across their different application systems. Recently a new CIO asked us to head up a “tiger team” to inventory his company’s data. He was surprised at the quantity of information needs that had been sent his way. As expected, he inquired about systems of record and data dictionaries. As you can imagine, he received multiple and conflicting answers which only exacerbated his confusion.

As a point of reference, well-known ERP systems can have in excess of 50,000 discrete data elements in their databases (never mind that some aren’t in English). As I’ve written in the past, many of these data elements have no use outside of the application itself.

Having terabyte upon terabyte of information is equally irrelevant if that data is unrelated to current business issues. The problem with a data inventory activity is that identifying and counting data elements in different systems and applications won’t necessarily solve any problems. Why? Because data across applications and packages is inconsistent: there are different names, definitions, and values, and there is no practical means of determining which data they actually have in common. This is like going to the hardware store and looking for a specific screw, but all the different screws are in one big barrel—you end up having to pick through each screw, one at time. When you find the screw, you just throw all the other screws back into the barrel.

The point of a data inventory isn’t to pick through data because it exists, but to inventory the data people actually need. If you’re going to undertake a data inventory, your output should be structured so that the next person doesn’t have to repeat your work.  Identify the data that is moving across various systems, as this indicates key information that’s being shared. Categorize this data by subject area. You’ll inevitably find that there are inconsistent versions of the data, enabling you to identify data disparities. You can then begin to develop a catalog of key corporate data that will form the basis of your data dictionary.

Inventorying the data that moves between systems accomplishes two things: it identifies the most valuable data elements in use, and it will also help identify data that’s not high-value, as it’s not being shared or used. This approach also provides a way to tackle initial data quality efforts by identifying the most “active” data used by the business. It ultimately helps the data management team understand where to focus its efforts, and prioritize accordingly.

So next time someone suggests a data inventory without context or objectives, consider sending them to college to study Grecian urns.

Evan also blogs at evanjlevy.com/.

Filed under:

Advertisement

Comments (6)
Evan! You rock!

That's the same advice I give folks as it avoids an expensive "boil the ocean" project!

Let me tell you that the analogy I *thought* you were gonna use is going to the HW store looking for a particular screw, and then proceeding to look inside every box in the store (regardless of what the item was), looking for the screw! That's FUNNY and ABSURD!!!

Yet, that's what folks want to do - open every app looking for all the data, even though perhaps only 1% or less is of interest to us!

Thanks for keeping us thinking and laughing!!!

Marty

Posted by Marty M | Wednesday, July 07 2010 at 2:17PM ET
Hi Evan,

I might misunderstand. Why is it not important to know that we have terabyte upon terabyte of information that is irrelevant and unrelated to current business issues? And perhaps if the business knew about it we might find out it is valuable and want to share it or get rid of it. And how can we say we are managing data well without even having a sense of how much we have and where it is?

I would use the HW analogy differently. If we are trying to manage inventory, don't we need to know everything in the store at some point ... at lesat once a year, not just the things being sold. First inventory, then catagorize so you know where to look for the screws.

Effective data governance would be enhanced by an automated, reltaively painless way to produce an inventory of data moving and at rest. That doesn't mean we have to answer data dictionary or system of record type questions for all the data, but at least we know what we don't know.

Posted by Ed U | Wednesday, July 07 2010 at 2:41PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Evan Levy

The Time Has Come for Enterprise Search
The Problem with Total Cost of Ownership
Complex Event Processing: Challenging Real-Time ETL
So You Think You’re Ready for a Data Warehouse Appliance, Part 2
So You Think You’re Ready for a Data Warehouse Appliance

More from Evan Levy »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.