Spam Model
Design Challenge
Information Management Magazine, July 2008
Imagine you are in this situation: as the senior data modeler in your organization, you review data models built by project teams to ensure they follow modeling best practices. There is a new reporting system currently being modeled called Spam. Spam will initially produce weekly spreadsheets for business users capturing spam email quantity by keyword on a given day of the week. Figure 1 is a subset of what might get produced, where the columns represent common spam words and the rows represent the days of the week. Advertisement
The project team could not agree on one single model for this project, so they present you with three different physical data models (see Figure 2). Note that in option 3, a Spam Reporting Factor Type Name could be Day of Week and Spam Reporting Factor Value could be Monday. Instead of selecting the best model from these three, you decide to play it safe and list the situations where each model would be ideal.

The Challenge
In what situations would you use each of these models?
The Response
Modifying a logical data model into an efficient physical design always involves tradeoffs. For example, Do I want it fast, or do I want it flexible? The main situations for when to choose each of these options follow.
Option 1
- Names can change. For example, if the Spam Keyword of Rolex needs to change to Watch, this can be accomplished by updating a single Spam Keyword Name value.
- Integration is a strong possibility. Chris Bielinski, architect, suggests option 1 is a good choice when integration is a requirement, such as fitting this model within a data warehouse design. The Day of Week table could be used with other data values for expanded reporting, Chris says.
- Reporting requirements are stable. This model meets the requirements as long as spam quantity is always viewed by Day of Week and Spam Keyword. Both Peter Deotto, data architect, and Javier Mazzurco, BI architect, raise this factor. Javier also says that the reporting tool must support cross-tabulation of data.
Option 2
- Data retrieval performance and user-friendliness are priorities. Marcin Kulakowski, data analyst, mentions that data in this structure is formatted for quick response time. Alefiya Sabuwala, data modeler, says, This denormalized representation of the data makes it easy for users to run queries against. This structure can also be user-friendly for the development team. Bob Mosscrop, enterprise data architect, suggests this structure might be a good choice if there is limited database SQL knowledge among team members.
- Reporting requirements including report format are stable. Option 2 would require the most rework if requirements change. Georgia Prothero, data modeler, says this model can be used when Spam Keyword Name is unique and not expected to change and keywords will only ever be required to be counted by days of the week. Marcin recommends asking the question, Do we assume that we will only hold one weeks data?
- Reporting tool has limited functionality. Javier says that this structure is beneficial when the reporting tool does not support cross-tabulation of data or the cross-tabulation consumes lots of resources.
Option 3
- Option 3 should be chosen when flexibility is the priority. Alefiya explains: If the requirements for the measurement of the spam events are very loosely defined and the likelihood of changing the measurement from days to any other factor over time is possible, I would select this model for the flexibility it provides. Wade Baskin, senior database architect, and Patrick McMullen, data architect, both suggest carefully weighing the flexibility gained with this design against the challenge of explaining such an abstract, complex design to others.
If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at http://www.stevehoberman.com/. There is also an overview on how to read a data model at my Web site.
Steve Hoberman is one of the world's most well-known data modeling gurus. He understands the human side of data modeling and has evangelized next generation techniques. Steve taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then. He has presented at more than 50 international conferences, and his third book, Data Modeling for the Business, is hot off the press.
For more information on related topics, visit the following channels:





