A great way to sharpen our analysis and modeling skills is to continuously address real-world scenarios. A modeling scenario with suggested solutions appears each month in this Design Challenge column. The scenario is emailed to more than 1,000 modelers. The responses, including my own, are then consolidated into this column. If you would like to become a Design Challenger and have the opportunity to submit modeling solutions, please add your email address at www.stevehoberman.com/designchallenge.htm. If you have a challenge you would like our group to tackle, please email me a description of the scenario at me@stevehoberman.com .

The Challenge

A global direct marketing company has hired you to be their data warehouse data modeler. You need to model one of their most important subject areas: address. You know the following:

  • Some countries have no postal codes.
  • The scope is mailing address. Therefore, concepts such as longitude/latitude, geographic information systems, etc. are outside the scope.
  • You would like to avoid or minimize the use of any abstract structures, such as Location or Location Component.

What would your model look like?

The Response

Figure 1 contains an address logical data model. Address is a fun subject to model for two reasons:

  1. Address appears often. The concept of "where?" appears in every reporting environment. For example, "Show me these metrics by city."
  2. Address is complex. Business rules vary within countries and across countries. Because of this complexity, we could easily complete this model with two boxes (Location and Location Component) and a few recursive lines. But imagine actually developing such an abstract structure. Not easy!

Figure 1: Address Logical Data Model

There are three concepts in this address model:

Rules. Each Country can have many rules for a given Address Component Type. Address Component Type includes all of the subtypes under Address Component (Country, City, etc.) and also any other address component types we haven't yet considered. Examples of other address components types are Building, Park and Suburb. Rules include capturing which countries have which component types, whether it is optional or mandatory and the sequence in which it should appear on a given mailing address label.

Geography. The geography section of the model links the actual geographic area values to each other using a containment relationship. For example, U.S. contains New Jersey; New Jersey contains Newark, etc. On the model this includes Address Component and all of its subtypes, and Address Component Connection which connects any two Address Components to each other in a containment relationship.

Addresses. This is the actual mailing address. The Address entity includes the Address Number (e.g., 5, 10, 500) and Address Number Additional Information Text (e.g., Apt 3B, Suite 105, Lobby, Basement, Wing) and Post Office Box Indicator (e.g., yes this is a post office box, or no it is not a post office box). The Mailing Address entity connects an Address to its components.

Let's try to break up the model by walking through the following complex address:

6/123 Great South Rd
Auckland 1003
New Zealand

The Address entity would have one instance for this address (note that "6/" means "Apartment 6" in New Zealand):

Address Identifier (surrogate key): 1
Address Number: 123
Additional Information Text: Apartment 6
PO Box Indicator: No

The following are the Address Components tied to this address:

Country: New Zealand
Suburb: Owairaka (Note this is not one of the defined subtypes, so there will be a new Address Component Type called "Suburb" and the value "Owairaka" will appear in Address Component Name)
State: Auckland
Postal Zone: 1003
Street: Great South Road

I validated this model against six other complex addresses. As a homework assignment, walk through this model with your own address!

This model includes only the minimal amount of information to illustrate the business rules. There are pieces intentionally missing, for example, the person or organization we are addressing, address types and usages to ensure whether these addresses are residential or business addresses and whether they are for billing or shipping purposes. Also missing are street aliases when the same road has multiple names (e.g., Route 195 is also known as the Data Modeler's Highway) and street direction to indicate north, south, etc.

Because address is a complicated subject, it is easy to make it an overly complex model. To see very complex address models with more than 100 address entities, Google "address data model." Most of the time, these complex models are unnecessary. Focus on practicality and flexibility, and you will make the right design choices.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access