Photographed by Ray Ng
Bill Inmon's Data Warehouse 2.0 tackles industry trends, unstructured data and the data lifecycle
Of all the mainstream business technologies, classic data warehousing might well be considered the least evolved in terms of practice and approach. And while data warehousing continues to spread as a foundational technology, the availability of information and speed of change have led businesses to add operational data strategies to the data warehouse mix - something the textbook writers didn't originally envision. So it might well be left to one of the fathers of the industry to update the definition of the data warehouse. That person is Bill Inmon, president of Inmon Data Systems. His model - called Data Warehouse 2.0 - is delivered with a complete architecture, outright enthusiasm - and a little ambivalence. You'll find the technical details in Inmon's recent articles in DM Review and at his Web site, www.inmoncif.com; more recently, DM Review Editorial Director Jim Ericson spoke with Bill Inmon for a philosophical take on DW 2.0.
DMR: Why do we need Data Warehouse 2.0?
Bill Inmon: There are two reasons for DW 2.0 - the first is for the integrity of the definition because I feel there are too many definitions floating around. The second reason is the need for a vision for the future of data warehousing, which I believe a lot of people in the industry have wrong. It came from confusion and from vendors trying to sell products. There were people building transactional systems they were calling a data warehouse; people building federated versions of a data warehouse; people building data marts that they were calling a data warehouse. Those are just some of the renditions.
DMR: What are the main distinctions between DW 2.0 and DW 1.0?
BI: The first major distinction is that the DW 1.0 never recognized the lifecycle of data within the corporation. DW 1.0 said, "Here's some data." DW 2.0 says, "Here's the data; it has a lifecycle, and each of the different portions of the lifecycle have unique characteristics." The second major difference between DW 1.0 and 2.0 is the recognition that unstructured data and structured data should both contribute to the data warehouse. There is a wealth of information in the world of unstructured technology, but it has to be built properly for the data warehouse.

DMR: We'll get to unstructured data in a moment. First, your DW 2.0 model adds an "interactive" zone to address systems that don't meet the definition of a data warehouse. Is this a concession to the need to leverage operational data?
BI: Well, DW 1.0 was never meant to do transactional processing. Yet certain vendors have something called an "active data warehouse," and they insist on doing transaction processing in the data warehouse. So, if we're going to be doing transaction processing in the data warehouse, let's at least do it with recognition of the architectural principles that are needed for both data warehousing and transaction processing.
DMR: But you're not sold on the idea of collecting near real-time operational information and aligning it with historical information?
BI: This was never the intent of a data warehouse. We've always recognized the need for operational reporting and operational analytical processing; it's just that it was in a different bucket. Some vendors are trying to say there should be one source of data for all reporting. That has never been true. But, if you're going to do transaction processing in the data warehouse, do it in the interactive sector where you can meet the architectural necessities.
DMR: How does this new interactive zone work in the context of the classic "integrated" data warehouse?
BI: It eliminates the confusion of technology. In the interactive sector, there is a whole art to getting good, high-performance processing in a transaction environment. You've got to mold transactions a certain way. You need transaction and data integrity; you need queue management. In the integrated technology, you're able to store a lot more data. You don't have to worry about transaction workload in terms of uniformity; you don't have to worry as much about queue management. By having the sectors separate, you're able to apply different technologies that are optimized on different things. The notion of the system of record in DW 2.0 is of data that is spread over different sectors. Part of the system of record is in one sector, another part is in another sector and so on. It's important, but in the grand scheme of things, it's rather minor; for all practical purposes the integrated sector is the old classical data warehouse.

DMR: Getting to unstructured data, the holy grail of DW 2.0 seems to be the idea of "structuring" text and making it available in the interactive zone of the data warehouse.
BI: That is correct, and we have been working on this for about three years. Right now, technology is divided into camps. In structured technology, you've got products like Business Objects, Oracle and DB2; in the unstructured world you have products like ClearForest, Convera and Documentum. The idea is to bring unstructured data to where you can leverage an analytical technology that's already in place. But, if you just proceed with the idea of bringing unstructured data to the structured world, you're building a data junkyard. You need to integrate textual data before you bring it into the structured environment. So we're definitely not talking about a search engine; we're talking about a textual integration engine.











Be the first to comment on this post using the section below.