Free Site Registration

Aggregation, Summarization and Abstraction Demystified

Design Challenge

Information Management Magazine, November 2006

Steve Hoberman

A great way to sharpen our analysis and modeling skills is to continuously address real-world scenarios. A modeling scenario along with suggested solutions appears each month in this Design Challenge column. The scenario is emailed to more than 1,000 modelers up to the challenge. Many of the responses, including my own, are then consolidated into this column. If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at www.stevehoberman.com/designchallenge.htm. If you have a challenge you would like our group to tackle, please email me a description of the scenario at me@stevehoberman.com.


The Response

Abstraction, aggregation and summarization are each modeling techniques used to improve the stability and performance of the overlaying application. Abstraction is a logical data modeling technique that increases application stability by accommodating unknown data requirements, and both aggregation and summarization are physical data modeling techniques whose primary purpose is to reduce data retrieval time. These terms can be confusing because abstraction and aggregation can lead to the same data structure, and often aggregation and summarization are incorrectly used as synonyms.

Abstraction

Abstraction is a technique for redefining data elements, relationships and entities into more generic structures. For example, Figure 1 contains the entities Customer and Order and their business rule that a Customer can place many Orders, and that an Order must be placed by one Customer. Figure 2 contains an abstraction of these entities.

Advertisement


Figure 1: A Logical Data Model Before Abstraction

Customer has been abstracted into a Person/Role structure. A Person can play many Roles and a Role can be played by many Persons. Flexibility is achieved because the model can support Bob as a Customer and also Bob in a different role as Employee or Vendor. Order has been abstracted into Transaction, and the relationship between Customer and Order now exists between Person Role and Transaction.


Figure 2: A Logical Data Model After Abstraction

The semicircle represents subtyping, which is often used when abstracting. Diana Wild, data administration group leader, states, "The more general data is stored together (in the supertype) and referenced by the individual members of the set (the subtypes). A supertype entity contains attributes and relationships to other data that all the subtypes share."

Some prefer the term "generalization" over "abstraction." Gordon Everest, professor emeritus, uses the term abstraction when something is left out. He explains, "If you have a detailed data model diagram, for example, you do not need to present it to a user all at one time and in all its detail. With generalization, we recognize commonalities and form a higher-level construct."

Aggregation

Aggregation is a physical data modeling technique where structures are combined without losing granularity and without increasing redundancy. When one-to-one relationships are combined into a single entity, the same level of detail still exists, and you do not have the data redundancy that occurs when denormalizing a one-to-many relationship. Figure 3 shows what the model in Figure 2 might look like after aggregation.


Figure 3: A Physical Data Model After Aggregation

There is a one-to-one relationship between supertype and subtype and, therefore, we aggregated Customer into Person Role and Order into Transaction. Most likely, there will be a type code data element in Person Role which can have the value "C" for Customer and a type code in Transaction which can have the value "O" for Order.

Summarization

Summarization is when you combine like things together and store them at a higher level of granularity. Jeff Pekrul, data architect, defines summarization as, "The process of reducing a number of records to a single record by adding the value of one or more fields that have a common key. The process of summarization does not retain the base data." For example, in Figure 4 we summarized Order details from Figure 1 into the entity Monthly Sales.


Figure 4: A Physical Data Model After Summarization

Although we cannot look at Bob's order from April 1, we can report on how much Bob generated in monthly sales during April.

Steve Hoberman is one of the world's most well-known data modeling gurus. He taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then. Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!), and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of "Data Modeling Made Simple," "Data Modeler’s Workbench" and "Data Modeling for the Business (Technics Publications). He is the founder of the Design Challenges group and inventor of the Data Model Scorecard.

For more information on related topics, visit the following channels:

Advertisement

Advertisement