Last month, we discussed the importance of adhering to product and technology standards to increase both the ROI of your IT investments and the value of your IT organization. In this issue, we continue the architecture discussion to include the architecture of your entire warehouse project.
Architecture standards form the blueprint of what you are going to build and how you are going to build it. They are not limited to the data warehouse itself. These blueprints help determine the people, skills and resources needed. They also help in planning the project tasks, timeline, costs and risks.
Data Warehouse Architecture as Framework
Just as you need a blueprint in building a house, you need a blueprint in building data warehousing (DW) and business intelligence (BI) solutions. This framework needs sub-architectures for data, information, technology and products.
- The data sub-architecture includes the sourcing of the data, data quality and data management.
- Information sub-architecture is how the information flows and is transformed from information-producer to information-consumer and its business use.
- The technical sub-architecture involves the technologies used in the construction and maintenance of the data warehouse environment. It is important to separate this from the products to clearly distinguish your needs (prior to hearing the vendor's marketing pitch).
- The product sub-architecture is the selection of the products that meet the requirements of the other three sub-architectures. Too often, people are heavily influenced by vendor pitches and fail to select products that they have considered in the context of their own needs.
You should have a data warehouse for storing enterprise-wide data, including historical, detailed transactional data and dimensions or reference data such as customers, products and employees. There will also be data marts, built from the data warehouse, that contain domain-specific data oriented toward business users in a specific business function, division, geography, product line, et cetera. The DW is generally normalized, while the data marts are in a star (or snowflake) schema or other format linked to the BI tools being used.
It is important to note that the data warehouse and data marts may actually be schemas in one database instance, separate databases or even distributed networked databases. The hub-and-spoke architecture is a logical design, while the actual physical implementation should be governed by your data warehouse architecture standards and procedures.
Further extension of the hub-and-spoke best practice would involve creating additional spokes from the original spokes (like branches on a tree) to accommodate the use of cubes associated with online analytical processing (OLAP) tools or pivot tables with spreadsheets. The data should flow from the DW to the data marts to the cubes, with the IT group being able to show the workflow and lineage of the data. This is how you obtain consistent numbers across your business user reports or analysis, provide transparency and traceability for the business, and improve the IT group's productivity.
The use of an extract, transform and load (ETL) tool provides several significant benefits. First, it will be your point of documentation and version control for your data sourcing and data transformation processes. Very few IT groups document their custom-coded extracts, and even fewer maintain that documentation. The current major ETL tools on the market will be able to generate your documentation.
Second, workflow, with its dependencies and error or exception handling, is built into major ETL tools. If you "roll your own" ETL workflow, you will have to use or build tools to create the workflow, schedule the processes, manage job scheduling and processing, and control code management. "Roll-your-own" solutions typically fall short of recreating what is built into robust ETL tools.
Finally, it's simply more productive to generate and maintain extracts with ETL tools than it is with custom coding. Just make sure that your people are trained and experienced in the ETL tool and that the scope of the project justifies the investment.
Data Modeling Tool
The data modeling tool is another point of documentation and version control. In addition, data modeling tools have interfaces that feed your data models (schemas) into the ETL tool, database and BI tools, thus improving productivity, increasing consistency and reducing errors or misinterpretations between members of the team working with different tools.
Your BI and DW environments will only be as solid as their architecture. If the foundation is poorly designed, it will only encourage the creation of more information silos.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access