A great way to sharpen our analysis and modeling skills is to continuously address real-world scenarios. A modeling scenario along with suggested solutions appears each month in this Design Challenge column. The scenario is emailed to more than 1,000 modelers up to the challenge. Many of the responses, including my own, are then consolidated into this column. If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at www.stevehoberman.com/designchallenge.htm. If you have a challenge you would like our group to tackle, please email me a description of the scenario at me@stevehoberman.com.


The Response

Abstraction, aggregation and summarization are each modeling techniques used to improve the stability and performance of the overlaying application. Abstraction is a logical data modeling technique that increases application stability by accommodating unknown data requirements, and both aggregation and summarization are physical data modeling techniques whose primary purpose is to reduce data retrieval time. These terms can be confusing because abstraction and aggregation can lead to the same data structure, and often aggregation and summarization are incorrectly used as synonyms.

Abstraction

Abstraction is a technique for redefining data elements, relationships and entities into more generic structures. For example, Figure 1 contains the entities Customer and Order and their business rule that a Customer can place many Orders, and that an Order must be placed by one Customer. Figure 2 contains an abstraction of these entities.

Figure 1: A Logical Data Model Before Abstraction


Customer has been abstracted into a Person/Role structure. A Person can play many Roles and a Role can be played by many Persons. Flexibility is achieved because the model can support Bob as a Customer and also Bob in a different role as Employee or Vendor. Order has been abstracted into Transaction, and the relationship between Customer and Order now exists between Person Role and Transaction.

Figure 2: A Logical Data Model After Abstraction


The semicircle represents subtyping, which is often used when abstracting. Diana Wild, data administration group leader, states, "The more general data is stored together (in the supertype) and referenced by the individual members of the set (the subtypes). A supertype entity contains attributes and relationships to other data that all the subtypes share."

Some prefer the term "generalization" over "abstraction." Gordon Everest, professor emeritus, uses the term abstraction when something is left out. He explains, "If you have a detailed data model diagram, for example, you do not need to present it to a user all at one time and in all its detail. With generalization, we recognize commonalities and form a higher-level construct."

Aggregation

Aggregation is a physical data modeling technique where structures are combined without losing granularity and without increasing redundancy. When one-to-one relationships are combined into a single entity, the same level of detail still exists, and you do not have the data redundancy that occurs when denormalizing a one-to-many relationship. Figure 3 shows what the model in Figure 2 might look like after aggregation.

Figure 3: A Physical Data Model After Aggregation


There is a one-to-one relationship between supertype and subtype and, therefore, we aggregated Customer into Person Role and Order into Transaction. Most likely, there will be a type code data element in Person Role which can have the value "C" for Customer and a type code in Transaction which can have the value "O" for Order.

Summarization

Summarization is when you combine like things together and store them at a higher level of granularity. Jeff Pekrul, data architect, defines summarization as, "The process of reducing a number of records to a single record by adding the value of one or more fields that have a common key. The process of summarization does not retain the base data." For example, in Figure 4 we summarized Order details from Figure 1 into the entity Monthly Sales.

Figure 4: A Physical Data Model After Summarization


Although we cannot look at Bob's order from April 1, we can report on how much Bob generated in monthly sales during April.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access