Continue in 2 seconds

In order to implement aggregate awareness, what are the precautions to be taken?

  • May 25 2006, 1:00am EDT

Q: In order to implement aggregate awareness, what are the precautions to be taken?

Tom Haughey's Answer:

First, understand the capabilities of your technology. This could be you RDBMS or your OLAP tools. While they almost all purport to support it, not all support it equally or equally well.

How does your technology maintain the aggregates and achieve its aggregate awareness? Generally there are two choices for keeping the aggregate up-to-date: immediate or deferred. Immediate updates the aggregates each time the underlying data changes. There can be ongoing performance implications in this. The deferred option updates the data in batch at a specific time. Deferred usually requires the entire table to be recalculated and reloaded. Does the aggregate update change only the affected rows or do you have to reload the entire aggregate table? Does the DBMS restrict you from joining the aggregate data to data not contained in the aggregate? Or can you join the result set from query aggregates to other data?

Second, one precaution I strongly recommend is this: don't take at face value everything the vendor tells you. They will almost all tell you they can do everything you want. The solution to this? 1. Do your homework and research the vendor. 2. Get the vendor to demo it using examples you provide. Their examples will all be stacked to work perfectly. 3. Test it on your own.

Third, evaluate your need for aggregates, regardless of aggregate awareness. Here are some criteria.

There are two main reasons to aggregate, namely, to:

  1. Improve performance or
  2. Provide consistent numbers for further usage.

Given this, aggregate where:

  1. There is a large volume of original data. If the volume is low, why aggregate?
  2. You can achieve a reduction of the data of at least 90% (from 10 to 1). Even smaller reduction ratios will often help but this is a good guideline.
  3. Aggregate for queries that are run often (collect sample day's queries and do predicate analysis).
  4. Aggregate data that is stable. This is the problem of restatement. If the underlying data is not stable and would require restatement, then the aggregates would have to be redone. This is very costly. For example, a territory is moved to a new branch office. Your business rule is that the last two years of sales for the territory are moved to the new branch office. If you have aggregates at the Branch level, you will have to revise them. This can be hugely costly because you have to go back over two years of data to do so. This could easily be more costly than creating a weekly aggregate in the first place.
  5. Aggregate for data that is reused frequently. Say you have a data mart that supports Activity Based Costing. Your costing routines regularly use the following values for each customer: number of orders, number of shipments, number of contacts, etc. It is pointless not to pre-calculate these They should be calculated and stored so that each algorithm can simply refer to them.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access