The two-part series regarding federated data warehouse (DW) architectures that appeared in the December 1999 and January 2000 issues set a record for e- mail responses from readers. (The columns are available in the archives at www.dmreview.com and www.egltd.com.) The feedback was overwhelmingly positive, and there were many requests for further information. In an effort to answer all the questions most efficiently, I've combined and condensed them into this month's column.
What is a federated data warehouse (DW) architecture? A federated DW architecture is an overall system architecture that accommodates multiple DW/data mart (DM) systems, operational data stores (ODSs), amorphous reporting systems, analytical applications (AAs), etc. As the Internet is a network of networks, a federated DW architecture is an architecture of architectures. It provides a framework for the integration, to the greatest extent possible, of disparate DW, DM and analytical application systems.
Isn't a single, central DW system better? Central data warehouses offer several powerful positive attributes; and if your site has the high-level sustainable pain and political will required to be successful, then you should closely examine this option. (To help you decide if this approach is suitable to your site, check out the free build-approach automated assessment in the resource library at www.egltd.com.)
How does a federated DW architecture work? A federated DW architecture shares as much core information among the various systems as possible. This is accomplished by sharing critical master files or dimensions, common metrics and measures, and other high-impact data across all systems that can make use of the information. It is usually accomplished via an enterprise-class ETL tool, which provides a common meta data repository, and the use of common data staging areas.
Isn't this the same as a bottom-up, conformed dimension/bus approach? The sharing of common data points is the same, but in a federated DW architecture you must resign yourself to the fact that individual components of the system will retain unique feeds from various internal and external data sources that are not shared with any other component. This is less than elegant, less than perfect, but a political and practical reality. Also, data sharing in a federated DW architecture is usually not as cleanly implemented as in a straight bottom-up scenario. For instance, in a federated DW architecture, many times shared data is extracted from the mid-level of a component system rather from the source extraction process. As with comparisons with a top-down, centralized system, it is important to remember that a federated DW architecture is not the ideal theoretical vision, but merely the most pragmatic means to reach the goal within some diverse, heterogeneous environments.
How do we go about designing and building a federated DW architecture?
- Document your existing DW/DM systems via a high-level enterprise data warehouse architecture (EDWA). The highest level is an entity level diagram showing the various systems and any existing cross data flow and the meta data exchange between them.
- Document each of the existing DW/DM systems at the data flow level. This level includes data flow from each data source, any transformation and integration steps, and meta data repositories. Rate each major data element in terms of quality, availability and ease of access.
- In conjunction with your users, determine what data offers value add and impact across multiple systems. For instance, adding financial information to marketing information yields profitability by customer and demographic segment.
- Collect the various build-phase candidates that derive from step 3 and analyze them to determine impact and viability. (To help with this analysis, there is a free build-phase candidate automated assessment in the resource library at www.egltd.com.) Pick the candidate that provides the best balance between business impact and risk.
- Build a small, focused iteration of the federated DW architecture based on the winning candidate from step 4. Document and publicize success to establish and sustain the political will required for future iterations.
From the in- the-trenches perspective, the federated DW architecture is, in my opinion, the best alternative to achieve the maximum level of architecture possible in today's heterogeneous world of custom and packaged DWs, DMs and low cost, turnkey analytical applications.
Douglas Hackney is the president of Enterprise Group Ltd., a consulting and knowledge-transfer company specializing in designing and implementing data warehouses and associated information delivery systems. He can be reached at www.egltd.com.