A data warehousing architect at a large insurance company told Forrester: "When it comes to architecture, keep it simple. A few forms using plain geometry should be enough. The idea is to see through the complexity to the underlying pattern. Of course, the real world will be messier; but the value of architecture is to see through the intricacies to the underlying simplicity to empower planning, design, and implementation." Inspired by these and many similar remarks, Forrester surveyed 213 practitioners at the Data Warehousing Institute San Diego Conference in August 2004 (see Figure 1). The most common approaches to data warehousing architecture are:
- Centralized with hub and spokes. The data warehousing architecture reported most frequently is the data warehouse with attached data marts (42% of respondents chose this option).
- Centralized, pure and simple. A special case of the centralized architecture is one that implements the central data warehouse only, an option chosen by 18% of the practitioners.
- Decentralized. Independent data marts without consistent design are reported by 19%. Independent data marts often form a distributed ("decentralized") architecture.
- Federated. The key term "conformed" signals dimensional modeling - acknowledged by 15% of practitioners.
- Virtual data warehousing. The real loser is virtual data warehousing, which has been superseded by enterprise information integration and registers barely 1% of respondents.
Figure 1: Survey Results
In the real world, firms that are highly centralized in geography and governance should pursue a centralized data warehouse architecture to reap the greatest operational efficiencies and business benefits. In practice, the hub and spokes are implemented on different platforms and database instances, but there is no reason that this must be so. In some cases, both could be on the same platform - though it would have to be a large one if the number of combinations is also large. In contrast, those firms that are highly decentralized will prefer a distributed architecture, and those with a mixed organizational pattern should implement a federated one.
It should be noted that the survey did not ask about data modeling philosophy, and this survey is perfectly consistent with practitioners implementing dimensional models in different architectures - centralized, hub-and-spoke, as well as "conformed" designs. When the definitions of key structures such as customer, product and related data entities are specified by a consistent, centralized design but implemented according to the local priorities of the individual lines of business, the result is a federated architecture. These are also described as pure, conformed data marts. However, note that a careful reading of Ralph Kimball reveals that he emphasizes a centralized design and sees no requirement for a persisting, centralized database (source: Ralph Kimball, The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons, 1996).
What Does it All Mean?
Data warehousing systems distribute control over information for decision making in the enterprise. Therefore, the form of the data warehouse architecture aligns with how the modern corporation is governed with centralized decision making radiating from the center to the periphery. This means:
No data warehousing architecture is right or wrong in itself. Enterprises have succeeded with all the alternative architectures surveyed - and individual cases of failure have also occurred. The data warehousing architecture will often mirror the form of the enterprise that implements it. Thus, highly centralized enterprises such as financial services, telecommunications and airline transportation will find that a centralized data warehouse is the line of least resistance. Those enterprises with distributed operations will get the best results with distributed data warehouses, while those with a mixed pattern of governance will do best with a federated approach.
Data marts are a useful, but limiting compromise. The proliferation of data marts in an otherwise centralized architecture means that enterprises are planning centrally, but end up making compromises. Data marts often represent a compromise forced on a centralized design such as the need for an interim deliverable, incremental result, a response to a powerful political constituency that wants its own system or performance considerations.
Virtual data warehousing is dead, long live EII. In spite of being an interesting idea, virtual data warehousing is not getting traction in the market. This is because on-the-fly data integration is computationally complex, requiring significant bandwidth for data movement and horsepower to perform JOIN operations. Except for small volumes of data, a strictly limited number of data stores, or an exceptional workaround, the intricacies of real-world data warehousing applications limit applications of the virtual data warehouse due to complexity, the unmet need for schema integration and performance. Enterprise information integration has replaced virtual data warehousing.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access