Q:

We are developing a VLDB that will probably use aggregate tables as well as database partitioning. Oracle is the RDBMS and Business Objects is the BI tool. BI is aggregate aware but I believe only on Fact tables, i.e., I cannot have a dimension table and then a separate aggregate dimension table tied to a aggregate Fact (e.g., I may have a product dimension and a fact with product sales. I may then have a fact table for product category sales). If I create a Product Dimension and a Product Category Dimension, the Product Category would be defined twice and confusing to the users, i.e., Kimball’s solution to create separate summarized facts would not work because the database is not aggregate aware? Does this that every dimension tied to an aggregate table would need to be snowflaked so all detail and aggregates share the same dimension? Which databases are aggregate aware?

A:

Les’s Barbusinski’s Answer: First off, only fact tables can be aggregated because only fact tables contain metrics. If you are storing quantifiable attributes in your dimension tables (e.g., quantities, counts, dollar amounts, percentages, averages, rankings, etc.), you need to move them to a fact table. For example, it’s okay to store a customer’s birth date on the Customer dimension table, but his/her current age should be stored in a fact table (so that you can compute min, max and average ages, or group people by age ranges). As for creating duplicate dimension tables, this shouldn’t be necessary as long as you avoid using “consolidated dimensions.” Let’s back up a minute. Most dimensions consist of multiple hierarchical levels (e.g., an Organization dimension that consists of Regions, Districts and Stores). You can choose to implement this hierarchy as either discrete tables with FK constraints (i.e., separate Region, District and Store dimension tables) or as a single “consolidated dimension” table where every row represents the lowest level in the hierarchy and attributes from higher levels in the hierarchy are repeated (i.e. each row represents a Store and contains all of the corresponding attributes for the Store’s parent District and Region). If you implement the hierarchy using discrete dimension tables, then you can have a variety of base SALES, COST and/or PROFITABILITY fact tables tied to the Store dimension, and multiple aggregate SALES, COST and/or PROFITABILITY fact tables tied to the Region dimension. In your example, the Product and Product Category dimensions are in the same hierarchy. Hence, metrics defined at the lower (i.e., Product) level can either be rolled up dynamically or pre- aggregated to the Product Category dimension with no need for duplicating the dimension tables. Using discrete dimension tables is a more flexible approach, but may be more expensive because of the need for more joins. Conversely, using consolidated dimension tables yields faster queries, but may require multiple tables to handle aggregates. You have to decide which to use on a case-by-case basis. Finally, I am not aware of any RDBMS that is “agg aware” (i.e., that automatically routes queries to aggregate tables when they exist). Scott Howard’s Answer: Yes Brian, you’ve entered the world of theory vs. fact. What you’re describing reeks of the need for a dimensional hierarchy or a snowflake. I don’t have a good solution for you either because these hierarchies and snowflakes they produce are known to negate the star optimization for some RDBMSs including your choice. Aggregate aware is also misleading. Does suggesting that something as aggregate aware mean that the RDBMS is hiding the existence of the aggregate tables, automatically making them available to any tool, such as Business Objects, which requests the aggregate via standard SQL? Does aggregate aware mean that the tool must explicitly take advantage of the objects created by and managed by the RDBMS via some proprietary means? I’ve seen vendors twist these definitions, providing function that looks great on paper and in benchmarks but is useless for me and my client’s business needs. What I’m saying is what you may not want to hear. You’ll need to benchmark it both ways. Try it with the aggregate tables and then the old fashion way, without. The details about when the table would be used and not, and their effect on optimization is too confusing and voluminous to include here and often doesn't help because your benchmark may prove otherwise.

Chuck Kelley’s Answer: Business Objects does a great job with aggregate awareness. We use it extensively at one of my customer’s site. Your description here is not correct. What you do is create separate dimensions (and you can use some of the existing dimensions as well) with a separate summary fact table and then build the aggregate awareness into the business objects semantic layer (universe). It really works well and we have seen astronomical increases in performance with the summary tables. The users see no difference (other than performance) in the universe and, in fact, have no idea there is a summary table. Red Brick and (in some way) Oracle are aggregate aware. Aggregate aware means that the product you are using will rewrite a query to use aggregates if it can. In databases, that functionality will be in the optimizer. You query the base table and it determines an aggregate table(s) will be better and rewrites the query to access the aggregate table(s). Tools, such as Business Objects, do this during the generation of the query. I believe it should be at the database level, and the leading vendors are certainly moving in that direction.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access