Most would agree that a high-quality design process is critical to a successful data warehouse (DW) project. The main deliverables of a good data warehouse design are: A) a dimensional model that meets the business' analytical needs, and B) a warehouse schema that achieves the desired performance requirements for the target physical environment. Given that this isn't a mean task--a good design requires careful analysis and much iteration-- a strong DW design tool would be a relief to the industry as a whole.
For several years, data warehouse architects have had a variety of general-purpose entity modeling tools available to assist in general database design. These tools allow us to construct logical models out of boxes and arrows, fill out entity and attribute information and, ultimately, generate a physical schema for a variety of target databases.
While these tools are excellent choices for general data modeling, they aren't adequate for the more specialized requirements of DW design. General data modeling tools view a data warehouse as a collection of tables and columns, and the architect must independently manipulate these to achieve data warehouse- specific design or performance objectives. There is no DW intelligence or automation provided. In effect, today's architect ends up painstakingly designing a warehouse column by column.
Raising the Level of Design Abstraction
What architects really need are tools that raise the level of abstraction of the DW design process. That is, instead of dealing only in table-and-column terms, architects need tools that speak to them directly in DW terms such as facts, dimensions, hierarchies, attributes and so forth. Instead of manually manipulating tables and columns, architects should simply ask these tools to denormalize, composite, surrogate, star, snowflake or aggregate. In turn, these intelligent tools would perform the corresponding table-and-column manipulations that achieve these requests. In processing the architect's design changes, these tools should perform automatic impact analysis across the entire design, warning of inconsistencies and performing all necessary schema or model alterations.
"Fantasy" Design Tools
In this vein, let me conjure up a "fantasy design assistant." My fantasy tool would know that:
- Fact tables tend to be large; dimension tables are usually much smaller.
- Dimensions can have hierarchies, which can either be represented in one table (star) or several linked tables (snowflake) or something in between.
- Dimensions can also have attributes or characteristics that are usually linked to dimension members at specific levels of the hierarchy.
- Different facts may relate to different sets of dimensions; sometimes groups of facts share the same dimensionality.
- Facts can be aggregated along the dimension hierarchy levels; selected aggregates can be explicitly generated and stored in new tables to provide improved performance at query time.
While I'm dreaming, I'd also like my tool to allow me to experiment with different configurations of dimension tables and aggregates before I produce the final physical schema. The tool should, of course, support all the major relational database engines. And at the press of a button, I should be able to generate comprehensive documentation for the entire logical and physical design.
While the ROLAP technology vendors include limited design components within their product suites, none of these provide much assistance in the construction of the physical warehouse schema. Until now, most DW architects have learned either to make do with general purpose entity-modeling tools and a lot of manual checking or to give up completely, draw some diagrams in Visio and build the warehouse schema by hand.
Constructa--It Works the Way You Think
Until now. To my wallet's detriment, Anubis, a San Francisco-based DW consulting/software firm, went to market with their Constructa product before I managed to get around to developing my fantasy tool. Constructa is an integrated data warehouse design tool created to simplify and accelerate the design process. Constructa was clearly developed by data warehouse designers who shared many of the same frustrations that I (and probably many of you) have experienced. Constructa has "design intelligence"--it understands what goes into a quality DW design, and it helps to create it quickly. The tool knows the physical side as well as it does the logical. There are also the seeds of an understanding of enterprise data warehouse architecture, allowing multiple warehouses or marts to be produced from a consistent enterprise design.
Constructa lets you build a logical warehouse model independent of the physical implementation. Dimension hierarchies can be designed graphically. In parallel with the model, Constructa generates a physical schema which is guaranteed to be valid at all times. You can adjust the physical implementation either at a model-default level or down to the level of individual database tables or columns. Some of the key features:
- Automatically generates star schema, snowflakes and all variants. You can specify global model conventions and then adjust the results for each dimension level.
- Automatic change propagation. Any design changes are cross-checked for impact analysis across the entire design.
- Full control over physical model optimization, including denormalization of dimensions tables and creation of aggregate fact schema at any desired level of summarization.
- Global data type conventions for all table columns (identifiers and descriptions).
- Supports all major relational databases; associates platform-native physical database creation. hints (e.g., Oracle tablespaces and extents) with the model; can retain hints for multiple target databases.
- Global schema naming conventions (e.g., LKP_% attribute_name% for lookup tables) can be modified and are enforced across the physical schema.
In short, these features help an architect achieve quality designs in compressed time frames. A key benefit of using Constructa is that it enables someone who is inexperienced in data warehousing methodology to produce a design that is guaranteed to adhere to DW design principles. This removes some of the more clerical tasks from senior warehouse designers and should contribute to a measurable acceleration of the design process. Constructa can generate thorough graphical design documentation and provides soft-copy output as unformatted text.
Component-Wise Warehouse Design
Constructa provides an excellent first step toward comprehensive management of an enterprise data warehouse architecture. The tool can "import" design components (dimensions, attributes, facts, etc.) from one warehouse into another. This module gives you the ability to construct your data warehouses "component-wise" for the following benefits:
- Consistency. Enterprises can support warehouse/mart "hub-and-spoke" architectures, where departmental data marts are built component-wise from a single enterprise-wide design.
- Standardization. You can import naming conventions, data type rules and other standards from one design to another.
The import feature allows reuse of design components across DW efforts and, thus, leads to the potential availability of design templates: expert-designed, industry-specific, customizable models to jump- start your development.
Using the same facility, Constructa can read in a design from previously prepared meta data tables or views. Once industry meta data standards are defined, Constructa is likely to support them. Constructa is purely a warehouse design tool. Warehouse management features such as generation and scheduling of data summarization queries, management of data loads into the warehouse and user administration are not provided.
If you need a design tool for a new warehouse effort, you should consider Constructa. For certain situations, Constructa can significantly speed up or improve your data warehouse design process. It is the closest thing to that fantasy "design assistant" data warehouse architects are looking for.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access