In a last month's column, I reported of a client complaining to Giga that their data warehouse lacked data. It could not access the upstream sources of operational, enterprise data. This month, a client reports a subtly different but intimately related compliant: " ...(in our installation) the data warehouse contains the 'raw' data mostly extracted from operational systems. There was very limited design; little effort was spent on rolling up, or summarizing, the data. It is very complicated to collect all the needed information and to integrate and aggregate it. As a result, every new source takes a lot of effort which has prevented maximum use and business benefit from the warehouse." The client continues, offering a perceptive diagnosis of their predicament: "The data warehouse was built, not designed. We keep way too much detail and do not summarize enough data into prebuilt tables. Too many of our queries are ad hoc and take a long time to run because of detail level in the tables."
The folktale of the three little pigs and the wolf comes to mind. Only one of the pigs designed his house. When trouble arose which in this analogy might be all users simultaneously clicking on the submit icon for a complex ad hoc query a designed solution was able to handle the stressful situation that was not necessarily envisioned in advance. Giga's assessment: The data warehouse discussed in my January column lacked data; this one lacks information. Unlike the first example, there is plenty of data but not enough information. The irony is that not enough effort was expended at the front end to design aggregates so extra effort must be incurred at the back end to collect, integrate and aggregate all the information needed. The good news is that if the detailed data is available, designing and building consistent and unified representations of the customer, product and essential data dimensions is feasible. Some rework will be needed but the expended effort is not a total loss. In addition, given the detailed data, building aggregates ought to be relatively easy. Indeed, the 20 percent of the data that gets used 80 percent of the time is often not obvious until an installation gets some experience under its belt. Although this is not a discussion about tools, getting technology to monitor database usage can be helpful at this point. Aggregates are one of the chief performance enhancers for any data warehouse; therefore, justifying their creation and maintenance is relatively easy. Once the same business analyst has submitted an ad hoc query three times, it is not really ad hoc anymore. It is predictable and definable in advance, even if parameters such as customer number vary from day to day. Such a query should be captured, tuned, optimized and made part of a regularly scheduled process. In any case, adding business views and summaries (e.g., by time period, business unit, product family, etc.) should be a priority. This too is a part of the design effort.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access