When we create a one-to-many relationship between two entities, we copy the primary key from the entity on the one side (the parent entity) over as a foreign key to the entity on the many side (the child entity). We traditionally copy over all of the metadata associated with the primary key such as name, format and definition. The one exception where a foreign key can have a different name than its primary key is when there is more than one relationship from the same entity. To avoid having two or more data elements with the same name in the same entity, we "role name," meaning giving the foreign key a different name than its primary key.
For example, in the data model below, in Employee we have a foreign key back to Employee Type which has the same name as its primary key (Employee Type Code), and we have two foreign keys in Customer that point back to Employee. These two foreign keys are role-named to avoid having Employee ID twice in the Customer entity and to provide additional meaning as to what the foreign key represents. Primary Contact Employee ID points back to the Employee who is the primary contact for this Customer, and Initial Contact Employee ID points back to the Employee who initially made contact with this Customer.
In this data model, the definition for Employee Type Code is as follows:
Employee Type Code is a numeric value assigned to each organization-wide understood category for an Employee. These codes have business significance and are for human resources internal use only. Examples:
01 = Full time
02 = Part time
03 = Retired
And the definition for Employee ID is:
Employee ID is the unique, mandatory and stable business key for each Employee. It is assigned by human resources and used throughout the organization. Example: Bob Jones is assigned the Employee ID 123-AB-872123
Definitions also copy over from primary key to foreign key, so the Employee Type Code foreign key in Employee has the same definition as Employee Type Code in Employee Type, and Primary Contact Employee ID and Initial Contact Employee ID have the same definition as Employee ID in Employee.
Should we at times modify the foreign key definitions so they are more relevant to the relationship they represent? After all, if I just see the Employee ID definition in Primary Contact Employee ID, it is not very descriptive.
What guidelines would you apply in deciding whether a foreign key should have a different definition than its primary key?
Whenever we role name a foreign key, whether it is because of having two or more relationships from the same entity or for another reason, such as to better describe the foreign key, we need to revisit the foreign key’s definition. Sometimes we replace the default primary key definition with a completely new definition, and sometimes it is only necessary to add to the existing primary key definition to make it more precise and context-dependent.
Thus, we should modify the foreign key definition when we modify the foreign key name. In addition, there are times when we should modify the foreign key definition even when the name does not change. As Walt Cook, DW/BI section chief, says, we should modify the foreign key definition “ when the definition migrated with the foreign attribute does not clearly describe what the attribute values mean for the child entity type to which the foreign attribute belongs.”
Other Data Design Challengers submitted comments on redefining foreign keys that have been role-named:
- Liz Fanning, senior data architect: “The foreign key definition should always be contextually accurate, [therefore] we often change the automatically migrated definition so its meaning is defined from the perspective of the table in which it resides.”
- Brad Lindsey, business intelligence engineer: “ the decision goes back to the same decision to use more than one role for the primary key in the child table. If the business rules call for using roles in the child table, it would seem to be consistent to also give a specific definition to the roles.”
- Gary Whitney, information architect: “The two foreign keys from Employee to Customer should each have their definitions in the Customer entity modified to describe the Role that they play in the relationship.”
- Rich Kier, data architect: “ in my opinion, the definition in the child [table] should differ more often than not, or any time there is the potential for multiple relationships.”
- Craig Mangum, data architect: “I always change the definition if the foreign key has been role-named.”
- Kevin Heinsey, data architect: “At a minimum, it is useful to have the same definition cascade from the parent to the child. But this is not ideal. There is almost always benefit in describing the role that the foreign key plays within the new record. But when there is more than one foreign key to the same parent, it is absolutely necessary to change the definition. Relying on the unique column name is simply not good enough.”
- Fred Cohen, data governance: “If you need’ to change the name in the foreign key to clarify meaning, then the definition should be different from that of the primary key to clarify the definition.”
- Mike Harris, retired data modeler: “The definition should be different if the interpretation of the elements’ meaning is different. The initial contact employee id’ is NOT simply an employee ID, but that ID that belongs/belonged to the employee who initially made contact with the customer.”
- Joyce Norris-Montanari, consultant: “I would change the definitions when I changed the role name.”
- Vicky Li, data modeler: “When a role name is used, the definition definitely needs to be updated. For example, Primary Contact Employee ID and Initial Contact Employee ID should have different definitions. But in the case of Employee Type Code, I think it’s okay to keep the same definition as in its parent table.”
A number of practitioners agree when updating the foreign key definition, you can build it upon the existing primary key definition:
- Ray Doggendorf, database architect: “Whenever role naming a column, it is usually best to do two things: 1. Retain the original column definition wherever it currently exists. 2. Prefix the original column definition with additional information that explains the role wherever that new role-named column will exist.”
- Michael Brackett, consulting data architect: “The data definition solution is to inherit the existing definitions and further qualify them, but not repeat the existing definitions.”
- Jane Chatterley, senior consultant, Arch CoE: “My guideline is that context should be added to foreign key definitions. Leave the inherited definition in place and then append In this context...’ "
- Kenneth Hansen, data architect: “The Inherited Definition should always be elaborated to identify the context.”
A number of practitioners always modify the foreign key definition, even if role naming is not performed:
- Monte Montesano, data modeler: “My personal inclination is to always provide an enhanced definition for foreign key attributes. Rationale: the original definition applies to the attribute as a primary key, describing the attribute’s use as an identifier. In the child table, the use is always different, identifying a characteristic that applies to the child row. For example, I would enhance even the Employee Type Code definition within the Employee table to be something like Employee Type Code identifies the organization-wide category for this Employee.’"
- Graham Witt, associate director: “I never use exactly the same definition for a foreign key as for the corresponding primary key; the meanings are quite different when context is taken into account.”
- Thijs van der Feltz, information architect: “Even if there is a single relationship between the same two entities, I believe that a foreign key should always have a definition that describes its relationship to the parent entity, e.g. The ID of the EMPLOYEE that is the primary contact of a CUSTOMER.’"
- Dave Hay, guru: “Since the foreign key is, in fact, the implementation of a relationship, the definition should mostly be about the relationship. Thus, the Primary Contact Employee’ should be something like the unique identifier of the Employee who is the most important person for the customer to call or visit if there is a problem.’"
- Madhu Sumkarpalli, business intelligence consultant: “One should modify the foreign key definition so that the foreign key has meaningful information within the context of the child table. As one of the biggest consumers of data models, I build logical models in MicroStrategy (BI tool) based on the information provided in the data model (both LDM and PDM) and the business requirements in addition to many other details available. Many times, we fall into a trap of not fully understanding the meaning of a foreign key if it has the same definition as that of the primary key in the parent table.”
- Mary Komorowski, enterprise data modeler: “Yes, we should modify foreign key definitions so that they are more relevant to the relationship they represent. In fact, I believe all definitions for attributes within an entity should have some context within them (except perhaps those that are obvious, such as last modified date).”
As part of your corporate standards policy, have a best practice regarding when to change a foreign key definition. At a minimum, update the foreign key’s definition when the foreign key’s name changes (role-naming). Optionally, you may find it valuable to update every foreign key definition, especially when the foreign key means something different as a foreign key than its originating primary key.
If you’d like to join the more than 4,000 data modelers in the Data Design Challenges group, sign up at www.stevehoberman.com.