for Information Management Blogs
APR 3, 2009 2:40am ET

Blogroll

Inmon’s Vitriolic Slap At “Virtual Data Warehousing” Does Not Withstand Scrutiny

Print
Reprints
Email
In a recent article, Bill Inmon incinerates a straw man concept that he refers to as “virtual data warehousing (DW).” For those unfamiliar with Inmon, he is generally considered the founder of DW as a data management discipline, has been at it since the 70s, and has more published books and articles to his name than most mortals. So he clearly may be considered an authority on the topic of DW.

But methinks Mr. Inmon doth protest too much on this “virtual DW” bugaboo, however defined (we’ll get to that in a moment). Also, he attacks this concocted notion with such emotional vehemence that it’s clear he considers it a threat to the centralized EDW paradigm upon which he has built his career and reputation.

For starters, his definition of this concept is oddly vague and questionably narrow: “a virtual data warehouse occurs when a query runs around to a lot of databases and does a distributed query.” Essentially, Inmon defines “virtual DW” as the ability to a) farm out a query to be serviced in parallel by two or more distributed databases, b) aggregate and join results from those databases, and c) deliver a unified result set to the requester.

That’s an important query pattern, but not the only one that should be supported under (pick your quasi-synonym) data federation, data virtualization, or enterprise information integration (EII) architectures. Inmon’s definition excludes the many federated queries that may only hit on a single database, with no joins and results aggregation, and with the EII fabric handling the necessary on-demand transformation from that source’s schema to an abstract semantic model.

Per my data federation report from last fall, Forrester has a broader perspective on the topic than does Mr. Inmon. Data federation is any on-demand approach that queries information objects from one or more sources; applies various integration functions to the results; maps the results to a source-agnostic semantic-abstraction model; and delivers the results to requesters. Nothing in the scoping of data federation necessarily requires the multi-source aggregation and joining that Inmon puts at the heart of “virtual DW.”

Putting Inmon’s narrow scoping of “virtual DW” behind us for the moment, let’s consider his chief objections to this approach. First, it requires the “analyst to integrate data” (as if that’s something analysts are ill-suited for or regard as some inordinate burden). Second, it consumes resources, experiences suboptimal performance, and “shuffles a lot of data around the system that otherwise would not need to be moved” (as if centralized DWs don’t consume resources, experience performance bottlenecks, and move data). Third, it is “limited to the [historical] data found in the [source] databases.” Fourth, it suffers from “no reconcilability of data...[hence] no single version of the truth for the corporation.”

It’s a fairly straightforward matter to dispatch these objections:

First, data integration--through ETL, EII, and other approaches--is a core job function for DW professionals, not some alien function outside their core competency.

Second, data federation is often the optimal approach for low-latency BI (just check out the case studies in my data federation and really urgent analytics reports). Federated environments can be tuned to provide top-notch performance and minimize source-system impacts when “shuffling” data around in a decentralized fabric.

Third, the source databases in a federation environment often include DWs, which, per their core function, usually manage a considerable amount of historical data. Once again, see my data federation report with discussion of case studies for a) Federation of Local DWs via Centralized EII Infrastructure and b) Federation of Dispersed EDW and ODS Data Into Siloed BI Environments.

Fourth, data federation is not totally incompatible with data reconciliation. In fact, federation environments can be architected for single version of the truth, data governance, and master data management. However, it can indeed be tricky to manage data quality in federated environments (see Rob Karel’s coverage of MDM and DQ for a deep dive on that issue).

My basic objection to Inmon’s line of discussion is that he treats data federation as mutually exclusive from the enterprise DW (EDW), when in fact they are highly complementary approaches, not just in theory but in real-world deployments. Yes, data federation can be deployed as an alternative to traditional EDWs, providing direct interactive access to online transactional processing (OLTP) data stores. However, data federation can also coexist with, extend, virtualize, and enrich EDWs, as well as other data-persistence nodes such operational data stores (ODS) and online analytical processing (OLAP) data marts. The case studies in the cited reports bear that out.

Inmon’s arguments are worth consideration. The centralized EDW model he touts is useful for illuminating some traditional best practices. But by no means can it do justice to the stubbornly heterogeneous, distributed, mixed-latency BI and DW requirements of most enterprises.

 James Kobielus' blog can also be found at http://blogs.forrester.com/business_process/.

Filed under:

Advertisement

Comments (31)
Wow, this is like watching two people who speak two langauage argue. I don't consider either point, Mr. Kobielus or Mr. Inmon to be overly representative of reality. What we are fighting over here, like two lions with a piece of meat, is an attempt at a practical view of DW concept. I suggest that anyone who has lived their work-lives in the DW/BI trenches doesn't get too worked up over things like this. We stay true to the over arching concepts and philophies that shape our profession (thank you Mr. Inmon and Mr. Kimball) and apply interesting and new techniques where it makes sense - which at times can appear to be counter intuitive. The points made by Mr. Kobielus are helpful, informative and most likely in somecases bang-on, but not in all cases. This argument is not black and white. Like most things in our profession - conceptual or otherwise - it needs to be taken with a grain of salt.
Posted by Michael M | Tuesday, April 07 2009 at 11:15AM ET
James makes some good points, however, I've spoken with Bill Inmon on this matter and what isn't made clear is that some of Bill's concerns center on the fact that conventional EDW and the associated ETL represent consistent vetted processes for aggregating and managing data in manner that applies to all BI use cases. Federation is typically purpose oriented meaning that the same aggregations and associations of data may happen multiple times for different purposes representing a potential for multiple versions of the truth. While governance can address such concerns, in reality it rarely does. Despite Bills distaste for federation it is here and it will likely stay, but people should listen carefully to his concerns. I know Bill's oppinions transcend his personal interests and are very much oriented toward genuine concern for companies that invest heavily in DW and BI. He is sincere about helping DW consumers realize value.
Posted by | Tuesday, April 07 2009 at 11:33AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for James Kobielus

Big Data for the Global Grid
Big Data’s Open Source Momentum
Best Practices from Real-World Experiments
Naïve on Big Data’s Evolution?
Social Media Analytics Revolutionizing Marketing Campaign Management

More from James Kobielus »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.