Claudia would like to thank John Ladley for his assistance in writing this month's column.
In this article, the first of two, we investigate the need for alternative database designs to support differing types of analyses. In the second article, we'll illustrate how to determine the schema "style" most useful to meet your DW needs.
The demands of data warehousing have resulted in tremendous technological advances. Over the past few years, we've experienced great strides in improving our ability to capture, manage and store massive amounts of data in new and creative ways. New technologies and analytical capabilities require new paradigms in the minds of implementers especially when it comes to database design. Consider the paradigm shift from the now prevalent star schema.
Star Schema Versus Other Database Designs
Without doubt, the most popular designs include the star schema and its variations. There is good reason for this. The star schema is great for supporting multidimensional analysis in data marts with known, stable requirements; fairly predictable queries with reasonable response times; recurring reports; etc. Not surprisingly, multidimensional tools have earned a strong position in the marketplace as well.
The techniques for building star schemas are perhaps the most elegant, well-explained and useful techniques for determining business requirements, dimensions, keys and, when necessary, physical database schemas. Numerous projects have successfully applied these techniques and concepts.
However, there are other issues that relate to DW environments besides dimensionality:
- Cost of ownership is an issue now confronting many DW shops. Does the ongoing return on the data warehouse environment exceed the cost of maintaining that environment?
- A "stock" (SQL-based) query environment that does not open the data for exploration may not generate adequate return to offset the ongoing costs of maintaining that environment.
It seems reasonable to conclude that unique combinations of processing requirements and types of data require different structures. In other words, one approach/structure isn't always best.
Experience indicates that the "universe" of data warehouse processing types is diverse. It includes data mining, statistical analysis, data exploration and more. The decision support users performing these types of analyses often search for unexpected data relationships. To maxi-mize your data warehousing ROI, you need to embrace and implement data warehouse architectures that enable the full spectrum of analysis.
If your entire data warehouse environment consists of a single structure, then your ability to analyze your data is limited to the subset of analyses that the structure best supports. To maximize your data warehousing ROI, you need to embrace and implement data warehouse architectures that enable the full spectrum of analysis.
Why Alternative Database Designs?
For many statistical, data mining or exploratory analyses, there must be no hint of bias or arbitrary establishment of data relationships. The value of these exploratory algorithms depends on the absence of predefined relationships. By definition, the star schema must have predetermined, physicalized relationships, thus maximizing the straightforward, multidimensional queries but hampering any ability to look for unknown or unexpected data relationships.
For a low latency data warehouse (i.e., one with rapid update cycles), the star schema offers restrictions in update speed and insertion capability. Many of the mission- critical, large data volume data warehouses now appearing require rapid insertion capability.
The truth is that database structures vary across a spectrum of normalized, denormalized and flat files of transactions. The ideal situation is to craft the schemas after the requirements are established. Unfor-tunately, the database structure/solution is often selected before the specific business needs are known. Those of us in the data warehouse consulting business have witnessed development teams debating star versus normalized before starting business analysis. For whatever reason, architects and data modelers latch onto a particular design technique perhaps through comfort with a particular technique or ignorance of other techniques and force all data marts to have that one type of design. This is similar to the person who is an expert with a hammer everything they see resembles a nail.
Database schemas should be based on the usage of the data and the type of information requested. There are no absolutes, of course. In examining data relationships, the best design will be one that does not pre-establish or predetermine these relationships. Examples of such solutions would be flat files or floating point files used for data mining, normalized files for a low latency data warehouse. Multiple layer stars (e.g., multiple fact tables, elaborate snow- flakes or large summary tables) may work best for exotic financial reporting marts and the like.
Choosing a physical database solution without first understanding the business problem may result in a mismatch. Star schemas best address a subset of the wide range of possible business needs. Like hammers, their use is limited. By including a variety of physical database solutions in your toolbox, you'll have the flexibility to efficiently satisfy a wider variety of business needs.
Part 2 in this series will review the primary factors that dictate the best physical schema business problem, database size, access paths, latency of data, periodicity, etc. We will also examine some of the popular technologies supporting alternative database designs.
Note: The authors would like to thank the following people for their input into this series: Mike Lin, Tim Klooster and Dan Page (DISC); Bud Pacotti (WhiteCross); Bill Prentice (SAS Institute); Steve Eberly and Kevin Horigan (Sand Technologies); and Sue Osterfelt (Bank of America).
John Ladley is the president of Knowledge InterSpace Inc. and is a META Group research fellow. He can be reached at firstname.lastname@example.org.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access