This is a simple data mart question. What is summary data? Most books and white papers state that the star schema contains aggregated and summarized data. I understand aggregated data (i.e., monthly sales), but what is summarized? Is it a subset of the detail data or is there some type of transformation to the data to make it summarized?


Les Barbusinski’s Answer: In my experience, the two terms are used interchangeably most of the time. However, I have heard some people make the following distinction between the two: aggregate tables simply sum up transactional metrics against one or more dimensions, while summary tables store "synthesized" metrics. An example of this would be an accounting data warehouse where historical general ledger transactions are stored. An aggregate table would simply summarize these transactions by account, department and month. A summary table, on the other hand, would provide P&L metrics (revenue, gross margin, net income and other KPIs) by department and month.

Scott Howard’s Answer: I also have problems with this phasing. It's something that someone penned many years ago and seemed to stick. Yes summarization is a form of aggregation, thus the basis for our confusion. I think what was intended was an attempt to compare a typical application oriented or OLTP model to a typical DW or data mart model. OLTP models are generally very specific containing current detailed data. On the other hand, DW models contain very summarized historical information materialized and maintained in a way not possible in OLTP models. Now how do we get from one model to the next? Aggregation: average, sum etc.

Chuck Kelley’s Answer: I believe that aggregate and summarized are the same thing. A synonym for aggregate is summative (according to the Thesaurus in Microsoft Word). Some people use the term summarized and others use aggregate (including me!). Some of us try to be a bit more precise and use the word "or" instead of "and" (i.e., "… contains aggregated or summarized data …"), but it doesn’t always happen. Sorry for the confusion!

David Marco’s Answer: Summarized and aggregation are the same thing.

Clay Rehm’s Answer: In its simplest form, summary data is data that has been "summarized," or aggregated. This means that some form of detailed data has been rolled up to less detail. How this is physically stored depends on the preference of your data analysts and DBAs. Summary data can reside in a star schema, and it can reside in a single table that has been "flattened;" that is the key element of each dimension and the fact table are built into one "flat" table.

Summarized can mean that the data was simply rolled up (added up) or there were complex transformation rules to summarize the data. Additionally, summary tables can be at whatever level of detail the user needs, just as long as it is in one easy to access place.