JAN 3, 2008 12:37pm ET

Related Links

Birst Automates Connections to Big Data
February 8, 2012
The Data Behind Red Cross Donations
February 6, 2012
UBS Taps Big Data to Shrink Reputational Risk
February 6, 2012

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Single View of the Truth

Print
Reprints
Email

One of the bedrock goals of data warehouse projects is a single version of the truth. Yet truth is rarely so simple. In the classic example, a “customer” is one thing to a salesperson, another to the shipping department and something else to accounts payable. Good data warehouse designers recognize this and build different definitions into their systems, so users can access whichever version they need whenever they need it.

 

Yet the notion of a single version of the truth persists - and warehouse teams invest huge resources negotiating shared business models to define it. Why?

 

Dueling Spreadsheets

 

The problem is often described as “dueling spreadsheets,” where managers argue over whose data is correct. This is apparently something to avoid at all costs.

 

Personally, I love a good debate over data. But if you want to prevent those arguments, you have to understand what causes them. Just providing a single version of the truth won’t do the trick. Any data warehouse rich enough to be useful will contain enough variations of the truth that it, too, can produce conflicting results.

 

Let’s start at the beginning. Managers rely on whatever data sources they have available. In the absence of a warehouse, these are usually their local operational systems. Managers use these systems not just because they are handy, but also because they understand their contents. Because learning about a data set is often the hardest part of an analytical project, it is perfectly reasonable for managers to rely on the data they know.

 

Dueling spreadsheets happen because each manager’s local data set is an incomplete view of an entire problem. Call center managers can see call center information and might do an analysis that shows how to minimize call center costs. But the service manager will see service costs that result from poor call center treatments, such as dispatching repair people for problems that could have been resolved over the phone. Each manager can analyze her own data correctly and reach opposite conclusions about the best course of action.

 

Putting all that data into one warehouse wouldn’t solve the problem. The single version of the truth (that is, a shared data model) will include both call center data and service data. If each manager simply extracts her own department’s information, she will still end up with conflicting results.

 

The only thing that will change this is if both managers pull both departments’ data. Indeed, each manager really needs all the relevant data, which probably comes from many departments. Here is where the central data warehouse truly adds value: it makes all that data accessible in an integrated format.

 

But this brings us back to the original problem. Managers will use the data they find most familiar. Even if they have access to a comprehensive warehouse, pulling data for all different departments requires understanding where to find that data and how to combine it. Managers are unlikely to take the time to learn how to do this. Instead, they’ll either go back to their familiar local sources or pull the equivalent data from the central warehouse. Either way, they get the same incomplete result.

 

This problem can be mitigated but not really solved. It can’t be solved because managers don’t have the time or inclination to learn about proper warehouse procedures. Mitigation means making it easy for managers to see the data they really need, even if they didn’t think to look for it. It also means making it at least as easy to get that data from the warehouse as from local systems.

 

Waiting for the IT department to build a new data cube definitely does not count as easy, particularly if the manager could already pull it from the local system for herself. It is tempting to “solve” the problem by mandating use of warehouse information, but that probably won’t work. Many managers will just do without the information rather than invest major time in learning a new system. Or they’ll look at data from the local system and not show it to anyone else. Or they’ll ask an analyst to do the work for them, but only when it is worth the extra time, cost and trouble. Remember, we’re talking here about managers who have considerable discretion in how they do their jobs.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.