Your CIO has asked you to design a data warehouse to support the analytical needs of your Organization. At first this seems like a pretty straightforward request. But then after much due diligence, talking to different vendors and browsing Web sites and knowledge databases on the subject you find yourself more than a little confused. There are many different alternative architectures and implementation approaches that organizations have adopted. There is much disagreement on the pros and cons of one approach versus another, Industry experts often disagree with one another. Does this mean that any approach is OK? Are there any pitfalls if you choose the wrong approach, and what are the implications for your organization if you choose one approach over another? This article attempts to put some context around the confusion of picking the best architecture and approach and provides several considerations you should ponder to help you make the best choice for your organization.
Recap on Alternative BI Architectures
If you look at the different approaches that organizations have adopted when architecting and building business intelligence (BI) infrastructure, the solutions typically contain one or more of the following components:
- A distributed architecture where the integration of data is maintained separately (either logically or physically) from the analytical architecture components.
- A single database solution that supports both the data integration and analytical requirements for the organization.
- Staging databases. This is a term often used inconsistently to identify either temporary data structures or database layers used to feed analytical databases.Staging Databases can be persistent or temporary in nature.
- Normalized, denormalized or combinations of different modeling approaches across or within the different layers of the architecture.
- Single versus multiple database solutions and heterogeneous versus homogenous database solutions.
- Lots of terms that mean different things depending on who you are talking to: data warehouse, data mart, operational data store, staging database and so on.
Without having a clearly defined architecture and defining a consistent set of terminology, it should be no surprise that there is much confusion regarding this subject as well as different opinions on how you should implement a solution for your organization. So with all this information at your fingertips, where do you start? Does it matter if you choose one path versus another? Choosing the incorrect architecture could have long-term implications to your organization. As with any IT solution, a BI architecture should be driven by the business requirements, but for a BI architecture to be truly successful, you must look beyond the short term needs for your organization and focus also on the long-term goals.
Choosing the Best Architecture - Key Considerations
When choosing the best architecture for your BI solution, you must focus on both the short-term requirements and long-term goals of the organization. With this in mind, what are the key considerations for choosing the best architecture and approach?
- Has the scope of the effort been clearly defined and thought out? I.e., are you building a solution to support the requirements of a department or the foundation for an enterprise effort that must extend to support future requirements for other parts of the organization? If your focus is on a departmental solution, then often a simpler architecture and approach may suffice. But if you are striving toward a longer-term enterprise solution, then the architecture should be designed to accommodate this. The latter choice should consider both architectural options as well as alignment with organizational and IT current strategies and future vision.
Scope and complexity. Enterprise solutions typically have multiple database layers that separate the integration layer of the data from the analytical layer. The former layers are often called data warehouses or staging areas and may be temporary or persistent in nature. However the implementation approaches (as opposed to the choice of words) is most important here.
When developing enterprise solutions i.e., solutions that will be leveraged to support both long- and short-term organizational objectives, it is important to separate the data integration components from the analytical components. Why?
Source data can come from within and outside of the organization. Data may exist today, could be sourced tomorrow or may have to be manufactured. Data integration rules may be complex. Data quality issues may not be clearly understood.
The integration layer of the data should be designed to support the acquisition, integration and maintenance of the data. A normalized modeling approach is best suited for these requirements because it makes no assumptions regarding the underlying data and quality of the data. When data quality is important from an enterprise perspective, then a normalized approach is the best option. The reporting requirements for the organization may not be fully understood when starting to build an enterprise solution, so it is not advisable to model data structures based on unknown or vague reporting requirements. Separating the integrity versus the reporting requirements enables each component of the data architecture to be modeled and maintained based upon the unique set of requirements of each component without jeopardizing future flexibility of the overall solution.
To support reporting requirements, data is modeled to support the organizations analytical requirements. These requirements are best met when using a denormalized modeling technique and combinations of star/snowflake modeling design. This type of modeling technique does not support data maintenance operations well, makes no assumptions regarding the underlying data quality (i.e., may not provide visibility into data quality issues if designed without consideration for data quality) and, most importantly, may be flexible enough or contain the appropriate information to support future reporting requirements without changes to the data models themselves.
Separating the data layers for integration and reporting allows each layer to be designed appropriately based on its usage and provides more flexibility as business requirements are added or change over time.
Organizational focus on data quality. When data quality is not an important consideration and the reporting requirements are departmental versus enterprise-focused, then a simple reporting solution may be the best choice for your organization. If the focus is on developing an enterprise solution together with a focus on improving data quality within the organization, then the architecture outlined in #3 is the best option. By developing an architectural layer where the focus is on data acquisition and integration, the focus can be extended to support an enterprise-focused data quality effort also. This type of architecture allows a robust data quality solution to be designed for identifying data quality issues improve data quality upstream and provide improved future reporting and analytical needs.
Flexibility for future growth. It is common for organizations to build data mart solutions where the acquisition and integration of data as well as the data mart are built in a single database. The staging areas are typically temporary in nature and used to support the current reporting requirements set forth prior to the project starting. As requirements change and new requirements are added the data marts are extended. But often these types of solutions cannot extend to an enterprise effort without considerable rework. If new requirements require data that is only available in an existing data mart, then that data mart may not support the new business requirements easily. When historical data is required that is not available in the required formats, then the data mart may be useless. Data marts are good when supporting known business requirements but may not be flexible enough when business requirements change.
Budget. Enterprise projects can be expensive. These projects typically have more components and can therefore take longer to build than one-off departmental data mart solutions. Development times are therefore typically longer and require more resources to build and maintain. However, these projects can be delivered successfully when architected, designed and developed correctly. When delivered in an iterative fashion based on a structured architecture and methodology this allows enterprise architectures to be built quickly while providing business value in an iterative manner also.
Enterprise projects are best developed using best-of-breed extract, transform and load (ETL) and reporting tools. ETL tools should be platform and database independent and be able to operate in distributed environments. Reporting tools should provide an array of functionality, including batch, ad hoc and online analytical processing (OLAP) functionalities. The expense of these tools is surpassed by the functionality they provide.
- Projects will only be successful when staffed with the correct resources and skills. The organization should ensure it has the best skills available when undertaking an enterprise effort. Departmental solutions can be developed with a smaller set of skills if that is the focus.
Choosing the Best Architecture
When designing the best architecture for your organization, understanding both the short-term requirements as well as long-term goals is important. Even if the focus is on the short-term, it is important to understand the long-term goals and implications if the focus changes at some point.
Developing an enterprise design up front may be more complex but has the advantage that it supports both the immediate analytical needs for the organization and can scale to support future requirements that may be unknown. Enterprise architectures are flexible. Departmental solutions may support the immediate reporting requirements for a department but may not extend well to support future requirements.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access