A recent article by Quentin Hardy (Forbes 05/10/04, "Data of Reckoning") described all the advantages of what can be termed the virtual data warehouse (or the federated data warehouse). The article described how access to data is greatly accelerated by a virtual data warehouse.
Appearing in Forbes Magazine, the article is targeted to the businessperson, not the technician. The promise is that if you align with a virtual data warehouse, your problems of processing information are over.
The issues relating to the inadequacy of the virtual/federated approach to data warehousing are well known and have been discussed at conferences, in articles and in white papers for a number of years; yet the article in Forbes does not even mention any of the well-documented limitations of this approach.
The first limitation of the approach is the lack of semantic integrity of the data. As an example, suppose that in one database you have the term "GM" to represent General Motors. In another database you have the same term; but in this case, "GM" stands for General Mills. The federated system sees everything as the same, and the result is garbled and meaningless data.
Another major limitation of federating data is performance. When data from different technologies is federated, one must ask - what if one of the federated databases is being recovered? Or, what if one of the federated databases is down? How long does a query take when a database is down for a day?
Another major limitation of the federated/virtual approach is having to combine and integrate the data every time you want to do a query. Instead of having a single database that can be accessed as needed, entire databases must be "merged" or otherwise concatenated. This concatenation requires considerable system overhead and it must be done repeatedly.
Suppose an analyst recognizes the difficulties in the lack of integration in federating data. The analyst sorts and then federates the data. Then another analyst interested in the same data also recognizes the problems inherent in federation. This analyst also integrates the data prior to analysis. The problem is that the two analysts do not integrate the data the same way. The analysts obtain very different results even though they started with the same data.
Another major limitation of the federated approach is that of historical data. By definition, a federated/virtual database is limited to the historical data found in the underlying federated databases. In other words, if a database participating in federation only contains data from the previous month, then the larger federated database can also only go back in time one month. The problem is that many transactional databases jettison historical data as fast as they can in order to enhance performance. Therefore, it is not common for databases that support operational systems to contain very much historical data. And when those databases become federated, the federated database also does not support historical data to any great extent.
Perhaps the most compelling reason why the federated/virtual approach is inadequate is that of the inability to reuse the data. Suppose that the federated approach is used for an application. When a second application comes along, the second application must repeat in whole or in part the work done by the first application. When the third application comes along, the third application must repeat in whole or in part the work done by first and the second applications. With the federated/virtual approach to data warehousing, there is no reusability of the infrastructure.
The article in Forbes might have qualified the pitch it made for federated/virtual data warehousing by saying that this approach works well if:
- You don't care about the integrity of the data and the results of a query against data that is completely unintegrated;
- You don't care how many machine resources you use;
- You don't care how long the query takes;
- You don't care that there may still be inconsistency in the data when it is integrated;
- You don't need historical data to any great extent; and
- You don't care that there is no reusability of data and infrastructure.
One wonders how appealing federated/virtual databases would be if the businessperson had been told of these limitations. In this regard, Quentin Hardy and Forbes have painted a very distorted picture of the truth.
With all of the real drawbacks of the federated approach, why does the federated/virtual approach hold such allure? Two very obvious reasons follow.
Users have been very frustrated in getting to their data in recent years. Any promise of allowing the user direct control (or even a little more direct control) is immensely appealing. The user sees access of data as the problem. In truth, the users' issues only start with access of data. Once you access the data, there are the issues of data quality, data integration, data performance and so forth. The end user has been so starved for data for so long that he/she is not sophisticated enough to see that access of data is only the tip of the iceberg. Once access has been achieved, the other major issues arise.
Organizations will do anything to get around the issues relating to the integration of data. Old legacy systems are out of date, undocumented and fragile to the point that organizations are scared to go into the old systems and do anything significant to them. Witness the Y2K phenomenon. Rather than try to go back into old systems and make what amounted to a rather simple change, organizations opted to lay the problem of applications into the lap of the enterprise resource planning (ERP) vendor. Changing old legacy systems strikes terror in the hearts of IT professionals. With the federated/virtual data warehouse approach, conveniently there is no need to go back in and face those dreaded systems.
The opposite of the federated/virtual data warehouse approach is what can be termed the "single version of the truth" approach. This approach requires that old legacy data be integrated into a granular, historical foundation. Thousands of organizations have found that building the foundation of a data warehouse meets their needs. IBM's DB2, Teradata and Oracle all have legions of users who have found that once the data warehouse foundation is built, an entirely new world of information processing is opened to them. Federated/virtual users never make that discovery because they don't have a firm foundation to build upon.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access