Continue in 2 seconds

Spam Model

  • June 20 2008, 10:57am EDT
More in

Imagine you are in this situation: as the senior data modeler in your organization, you review data models built by project teams to ensure they follow modeling best practices. There is a new reporting system currently being modeled called “Spam.” Spam will initially produce weekly spreadsheets for business users capturing spam email quantity by keyword on a given day of the week. Figure 1 is a subset of what might get produced, where the columns represent common spam words and the rows represent the days of the week.

The project team could not agree on one single model for this project, so they present you with three different physical data models (see Figure 2). Note that in option 3, a Spam Reporting Factor Type Name could be “Day of Week” and Spam Reporting Factor Value could be “Monday.” Instead of selecting the best model from these three, you decide to play it safe and list the situations where each model would be ideal.

The Challenge

In what situations would you use each of these models?

The Response

Modifying a logical data model into an efficient physical design always involves tradeoffs. For example, “Do I want it fast, or do I want it flexible?” The main situations for when to choose each of these options follow.

Option 1

  • Names can change. For example, if the Spam Keyword of “Rolex” needs to change to “Watch,” this can be accomplished by updating a single Spam Keyword Name value.
  • Integration is a strong possibility. Chris Bielinski, architect, suggests option 1 is a good choice when integration is a requirement, such as fitting this model within a data warehouse design. “The Day of Week table could be used with other data values for expanded reporting,” Chris says.
  • Reporting requirements are stable. This model meets the requirements as long as spam quantity is always viewed by Day of Week and Spam Keyword. Both Peter Deotto, data architect, and Javier Mazzurco, BI architect, raise this factor. Javier also says that the reporting tool must support cross-tabulation of data.

Option 2

  • Data retrieval performance and user-friendliness are priorities. Marcin Kulakowski, data analyst, mentions that data in this structure is formatted for quick response time. Alefiya Sabuwala, data modeler, says, “This denormalized representation of the data makes it easy for users to run queries against.” This structure can also be user-friendly for the development team. Bob Mosscrop, enterprise data architect, suggests this structure might be a good choice if there is limited database SQL knowledge among team members.
  • Reporting requirements including report format are stable. Option 2 would require the most rework if requirements change. Georgia Prothero, data modeler, says this model can be used when “Spam Keyword Name is unique and not expected to change and keywords will only ever be required to be counted by days of the week.” Marcin recommends asking the question, “Do we assume that we will only hold one week’s data?”
  • Reporting tool has limited functionality. Javier says that this structure is beneficial when the reporting tool does not support cross-tabulation of data or the cross-tabulation consumes lots of resources.

Option 3

  • Option 3 should be chosen when flexibility is the priority. Alefiya explains: “If the requirements for the measurement of the spam events are very loosely defined and the likelihood of changing the measurement from days to any other factor over time is possible, I would select this model for the flexibility it provides.” Wade Baskin, senior database architect, and Patrick McMullen, data architect, both suggest carefully weighing the flexibility gained with this design against the challenge of explaining such an abstract, complex design to others.

If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at There is also an overview on how to read a data model at my Web site.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access