Anyone who has been awake and alert in our profession for the past decade or so has had the opportunity to watch the phenomenon of the maturation of data warehousing. It is true that data warehousing is a mature (or at least a reasonably mature) technology today and provides the foundation for many important corporate functions such as customer relationship management, business intelligence, ERP analytical processing, exploration processing, data mart analysis, and so forth.
It will be interesting to see how well data warehousing will stand up to the test of time. Will we (or those who follow us) wake up in 2050 and still find data warehouses around? What is required to pass the test of time? Some of the elements are:
Conceptual Foundation. Does the idea have a sound conceptual basis? Is that basis easily understood? Is it easily applied? Is there a solid motivation behind the conceptual foundation?
Products. Are there products that embody or at least align themselves with the idea? In some cases, there will be a product that completely encompasses the idea. In other cases, there will be a product that tangentially addresses the idea.
Practicality. Is the idea, once implemented, useful? Is it affordable? Does the idea forward the goals and aspirations of the corporation?
Widespread Applicability. How many people/corporations can use the idea? Is it vertical across industries? Is the idea applicable across international boundaries? Is the idea germane to people at the top and bottom of the corporate ladder? Outside the corporate ladder?
This simple set of elements is a good place to start examining what will/will not and has/has not passed the test of time.
Let's consider some ideas from the past and subject them with the advantage of hindsight to the test of time.
Distributed Database. There once was a thought that transaction processing databases should be implemented across multiple independent servers. The technology involved such things as two-phase commit. It is safe to say that the distributed database clearly did not pass the test of time. Where did it fail? The distributed database certainly had products. IBM and others went out of their way to build and promote distributed databases which had a reasonable conceptual foundation. Where the distributed database failed was in its practicality. Distributed databases never really worked in implementation. There were tremendous problems with performance, integrity of data and transaction processing that were never resolved.
Centralized Repository. In the days of the mainframe, there was a recognized need for meta data management. The mainframe environment was clearly a centralized environment. The result was the centralized repository. Where are centralized repositories today? There are a precious few that are still hanging on for dear life; but, across the industry, the momentum for the centralized repository is dead. Where did the centralized repository fail its test of time? It failed the practicality and the widespread applicability test. The architecture of today and the future is not a centralized architecture; it is a distributed architecture. Trying to fit a centralized meta data management framework into a distributed architecture is simply the wrong thing to do.
Artificial Intelligence (AI). Artificial intelligence came from the world of academia, but where is it today? Dead as a door nail. Certainly artificial intelligence has a conceptual foundation. Look at the bookshelves of academia, and you still see artificial intelligence tomes today. Certainly there were products. Ask all the California venture capitalists about the AI companies that they poured money into that went belly up. AI passed those tests of time but failed the practicality test. AI never understood that an infrastructure of information was needed. The emphasis was always on internal algorithms and elegance of end- user presentation. There never was an understanding that a body of data (primarily historical data) was needed to support the goals of AI. The data that was needed was dirty, hard to come by and required integration. This "dirty work" was simply beneath the proponents of AI. For lack of an infrastructure, AI failed.
Client/Server Technology. Client/ server technology did not really fail. Instead, client/server technology was subsumed into existing technology. Certainly client/server technology has left a positive mark on the world, but is client/server technology a clear, distinct surviving technology today? No, it is not. Client/server technology died when IBM convinced the world that the mainframe was just another server. The vendors of the world mutated the original thoughts behind client/server technology to fit their own needs. Since there was no conceptual foundation behind client/server technology, the mutation caused the death of client/server technology.
Formal Language Theory. Formal language theory is the parsing of structures to determine their validity and content. There is a conceptual body of thought behind formal language theory. Is formal language theory dead? Not at all. It is used every time someone uses a parser or a natural language interface to the computer. But, how many people are anxiously awaiting the next breaking news in formal language theory? Formal language theory is so arcane and so select in its audience that it may as well be dead. Where did formal language theory fail its test of time? It failed the widespread applicability theory. Only a few computer scientists and programmers ever need to know the intricacies of using a stack to parse a recursively defined structure.
Surround Technology. Occasion-ally, the idea appears that the legacy environment is so overwhelming that we should not worry about it. Instead we should "surround" the legacy environment with technology that allows us to go into the legacy environment and retrieve data, all based on the premise that we should not have to do anything to the legacy environment. This is a very appealing idea to those who have a large and complex legacy environment. There are, however, a whole host of reasons why the surround approach to information systems does not work and will never work. In short, the surround approach does not even come close to passing the practicality test.
The test of time is difficult to pass. A lot of ideas have come and gone and have only left a faint impression on the landscape of information technology. In fairness, the test of time never ends for any technology or concept, and data warehousing is no exception. In a sense, the jury is always out when it comes to the test of time. However, let's see how data warehousing has withstood the test of time to date.
What about the conceptual foundation for the data warehouse environment? Certainly there are differences of opinion, but the number of books and conferences testify to a firm conceptual foundation for the data warehouse. When it comes to a conceptual foundation, data warehousing passes the test of time with flying colors.
What about products? Depending on how you look at data warehousing, there are literally hundreds of products. There are the basic DBMSs. There are ETL products. There are data management products. There are OLAP and MOLAP products, statistical analysis products and end-user interface products. Go to a trade show some time; and in a very short amount of time, you will be convinced that data warehousing has products. Data warehousing passes this test of time, again with flying colors.
Let's examine practicality. Is a data warehouse useful? Is a data warehouse cost effective? If you believe the success stories of data warehousing and there are literally thousands of them data warehousing is very useful. Prior to data warehousing, there was precious little integration of data and no appreciation for the historical value of data. Data was stored in technologies optimal for everything but end- user access and analysis. With data warehousing, integrated data, historical data and easy-to-get data have become the norm. That's useful to the corporation in a hundred ways. Data warehousing passes the usefulness test. Easily.
What about the cost-effectiveness test? As long as data warehouses are of a reasonable size, the cost factor does not enter the equation. A fifty gigabyte data warehouse is affordable to almost everyone. However, when data warehousing starts to approach the terabyte range, the cost of data warehousing becomes an issue. Fortunately, with advances made in alternative storage and near-line storage, even petabyte data warehouses can be contemplated. As long as the consumer is willing to alter the classical path of building systems on high-performance disk storage, then data warehousing passes the test of cost effectiveness.
Now what about widespread applicability? Who needs integrated, historical data that is easy to get to? Practically everybody. This includes telecommunications firms, retailers, banks, financial institutions, insurance companies, airlines, universities, utilities, government agencies and manufacturers. In short, the list includes everyone who has any significant amount of data and/or significant amount of transaction processing. In this regard, data warehousing certainly passes the widespread applicability test.
The world has a lot of practical experience with data warehousing now. At least part of the jury is in; and, as best as we can tell, data warehousing has only begun.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access