A couple of months ago we wrote about the concept of householding. At that time we talked about the major concepts a high level. This month, we'd like to expand further the first step of the householding process. That first step is defining the business rules that determine what makes up a "home." While the answer seems obvious, if you don't take the time to adequately define the rules and plan the process carefully, you will end up with records grouped incorrectly, unique records purged or no records matched at all. None of these outcomes will satisfy your project sponsor.

Before you can define the rules, you have to analyze the purpose of the grouping into a "household." Note there is absolutely no reason that a given record cannot belong to multiple households. For example, John Smith is married with two children. One household could be made up of all four people. There may also be a reason to group John Smith with his "immediate extended" family - John, his wife, his parents and his wife's parents. Further, households are not limited to the physical homes we dwell in. They may include the places we work, shop or congregate. Step one is to define the scope of the householding effort: the Why.

Once the team understands the scope of the householding task, two groups need to be organized. The first is the core group of technology and business users who will be directly responsible for defining and implementing the householding rules. The second group is an extended group comprised mainly of business users and other subject matter experts from the organization who will be responsible for validating the business rules, verifying the resulting household assignments and supporting the core team as issues arise. Summarized below are some of the types of rules that should be considered regardless of the project scope. How each of these rules is defined is directly related to the project scope.

Rule 1: Duplicate Records

Identifying and eliminating duplicates (merger/purge) is the first step in any householding implementation. But, there are multiple levels of duplicates. For personal (not business householding) the most obvious rule is to identifying duplicates is First Name|Last Name|Address. This identifies many duplicates and will allow you to consolidate the information. However, there are times when this is not sufficient, such as when a person moves and the addresses no longer match. Keep in mind, that even in the best of times, processes that allow address change notification and standardization may lag by several months. Therefore, John Smith residing at 123 Main Street, New York, NY and John Smith residing at 4101 West 75th Street, New York, NY will not match as the same individual. Adding another slowly changing data element such as e-mail address to the rule can provide an increased level of accuracy. In this case John Smith residing at 123 Main Street, New York, NY with the e-mail address jsmith@stockbroker.com and John Smith residing at 4101 West 75th Street, New York, NY with the e-mail address jsmith@stockbroker.com will match as the same individual. Rules defining duplicates can be made increasingly more complex to yield greater accuracy but processing time increases so the trade off becomes processing time vs. percent accuracy.

Rule 2: Acceptable Duplicate Match Rate

Determine ahead of time what level of accuracy your householding project requires. The previous example demonstrates how the level of accuracy that can be increased and how it is directly related to the quality of your data and the sophistication of your matching rules. The level of accuracy provided in the example above by adding the e-mail address may not be required in your set of circumstances. Therefore, it is incumbent on the business users on the core team to drive the unmatched rate requirement. This can be based on some cost estimate (savings achieved by reducing duplicate mailings) or on a desire to reach a given percentage of responses driven by percent of mailings.

Rule 3: Grouping Records

Once the records are assumed to be unique, the next step in the process is to define what makes up a household, keeping in mind that a household is simply a grouping of related records. It may be a family unit, unrelated people living at the same address, members of a club, employees at a company or subsidiaries of a large corporation. Using the family unit case, an obvious rule would be Last Name|Address. This is a valid rule in the vast majority of circumstances. Smith residing at 4101 West 75th Street, New York, NY will group both John and his wife, Jane. But what if Jane's mother, Janet Doe, lives with her daughter and son-in-law? By using last name as part of the match rule, Janet Doe would not be grouped in the Smith household. Additional data elements may need to be added or subtracted based on the unique business requirements.

Rule 4: Acceptable Grouping Rate

As with duplicate record processing, the level of accuracy achievable when grouping common records will also vary based on the sophistication of the grouping rules and the quality of your data. With the example above, you may not want to group Janet Doe with the Smiths. That is a decision for the business owners and the reasons for the household groupings. Here too, it is incumbent on the business users on the core team to drive the unmatched rate requirement. If the purpose of the householding was to reduce the pieces of mail for a marketing campaign, you may want to eliminate the Last Name qualification as mail pieces are expensive, and the return may not be worthwhile enough to justify the expense of two mailing to the same address where the incoming mail is likely viewed by all residents. However, if the mailing is for a political campaign, it may be desirable to send mail to each unique name at an address.

Rule 5: Data Confidence Factors

This is probably the most nebulous of the rule categories. It is also the rule that will likely prove most difficult to define, if not implement. Data confidence is based the concept that every data element to be used in the householding process may be invalid. To define it, first, you have to consider the source of the data record. Names and address can often be cleansed and standardized. However, other types of data, such as demographic data, are not as easily guaranteed to be accurate. The sources of data must be considered and a confidence factor assigned for each data record. For example, one of the most famous householders for business information is Dun & Bradstreet. If an organization is registered with D&B, it is given a unique identifier at the headquarters level. This DUNS number is provided only after D&B has contacted the company and verified the data provided. Then, any company location is assigned that same DUNS number, so all office locations are known and matchable with a unique identifier. Further, any subsidiaries are provided with their own unique DUNS numbers, but they are also affiliated with their parent organization and its DUNS number. This is useful when developing householding processes to group employees from different subsidiaries under a master company that can have access to the parent company's DUNS data. Such business processes ensure accuracy and allow you to be very confident that the grouping logic will be accurate.

While there are other categories of rules that need to be considered when planning your householding process, this column has attempted to deal with the initial rules of duplicate record identification and common record grouping. In future columns we will look into the more sophisticated rules that need to be defined to complete the process. Having the rules defined and the ability in place to verify, validate and use the grouped information is the key to successful household logic.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access