Many companies began building data warehouses 10 or more years ago. Through considerable trial and error, these veteran companies have established best practices for building, deploying, managing and growing data warehouses and business intelligence applications. Most of these organizations with "mature" data warehousing environments have learned the hard way – they've made similar mistakes and rebounded from them. They have much to teach us about the "dos" and "don'ts" of data warehousing. Last month's column shared two of these mistakes, and two more are featured this month.

Mistake: Not Having Encompassing Architecture with a High Degree of Sharing. Placement of the data warehouse and the potential myriad of other databases (staging, data marts, ODS, etc.) in the data warehouse architecture is always a hot issue. Depending on the scalability of the chosen data warehouse technology, it is usually recommended that the various data warehouse databases be physically independent in separate partitions or machines.

You may decide to have a physically separate staging area (landing ground and transformation area for raw operational, pre-warehouse data), but you may not be able to take raw operational data and move it directly into a warehouse schema (especially a dimensional one) without some interim tables. These tables can simply be non- user-accessible tables in the warehouse, but it's an opportunity to take transformation cycles off the warehouse machine by using a physically separate staging area for larger warehouses.

Similarly, data marts can reside in the data warehouse instance. Here's an opportunity to take cycles – this time, access cycles – off the warehouse and use the warehouse database for the accessible, large-scale atomic data and as a distribution point for the data marts.

Regardless of the number of databases, enterprise-adjudicated versions of subject areas, built once for multiple uses, create enormous efficiencies of operation. Independent views of the customer mean the profitability analysis application will have a different customer view than the targeted marketing application. Enterprise subject areas also deliver internal operational efficiencies and can react to market conditions faster due to less internal focus on data gathering and reconciliation.

There are many pockets of opportunity in an unarchitected environment. Expanding this to the enterprise should yield an impressive value equation for a data warehouse program. The architectural key for a mature data warehouse program is a high degree of sharing of these corporate subject areas.

Mistake: No Ongoing, Proactive Management of Performance and Capacity Planning. Companies with mature data warehouse programs have learned to avoid surprises to the user community. However, it's bad when the data warehouse is not loaded one night or the quality in a certain field is apparently suspect. However, it's much worse when those who alert the organization to the problem are the users themselves, not the data warehouse builders. One of the prime areas where this is true is in query performance.

Many programs attempt to establish service level agreements before warehouse development for query performance. If used as a guideline only, this can help mobilize the team's attention to performance issues. It is impossible to predict with any accuracy the performance of a query prior to production; and, once implemented, performance is apt to change over time. Mature programs establish guidelines for query types (point queries, queries against summary tables or cubes, full table scans), but they need a way to understand how the performance of the data warehouse is improving or deteriorating over time.

With hundreds – potentially thousands – of queries run against a mature data warehouse daily, the barometer of performance must be representative of queries only. By continually capturing the performance of the most frequently run queries or those run by the most important users, mature programs can reduce the chance of a surprise phone call from an unhappy user.

The representative performance metrics can come from capturing actual runtime statistics from user- run queries or from queries run by the data warehouse team to get the baseline. Because mature programs realize that performance can vary by day, day part and season, representative performance is measured at many different intervals.

Performance improvement will be an ongoing process, tightly coupled with user query patterns.

These mistakes are excerpted from the Third Quarter 2003 Data Warehousing Institute (TDWI) publication, "Ten Mistakes to Avoid for Mature Data Warehouses" by William McKnight. The entire publication is available to TDWI members at: www.tdwi.org/research/display.asp?id=5000.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access