Surrogate Keys versus Natural Keys
Surrogate keys are often touted to be a means to database (DB) performance. A surrogate key is a single immutable, non-intelligent identifier used to substitute for the natural key, especially (but not only) a compound natural key.
Surrogate keys should not be confused with artificial logical keys. It is very difficult to find natural keys for primary entities such as Customer, Employee and Vendor. Therefore, we create artificial keys to make the entity unique, such as Customer ID, Employee ID and Vendor ID. These IDs are not surrogate keys because they have a business usage. "What is your customer ID?" "Please enter you Employee ID followed by the # sign." "Where is the payment for Vendor ID 12345?" In addition, these artificial keys may not be immutable over the lifetime of the entity. Surrogate keys are totally artificial and hidden, and are for system use.
For primary entities such as Customer, surrogate keys allow us to retain history. The Customer data would include Customer SK, Customer ID, Customer Name, etc.
Let us look at Order Line Item as an associative entity example. The natural compound key for Order Item could be Order Number, Product ID. A surrogate key for Order Line Item could be Order Key (integer) used to substitute for the compound key of Order Number, Product ID (see Figure 11). Surrogate keys can be used in any physical model. Database management system (DBMS) optimizers generally treat a surrogate key more efficiently than a compound natural key. Retrieving data and joining tables is more efficient with a surrogate key than with a compound natural key. However, a simple numeric natural key will perform equally to a surrogate key.

Figure 11: Natural Order Item Key and Surrogate Order Item Key
Surrogate keys introduce several complexities not found in natural keys. For example, whereas a natural key will ensure the integrity of the data, a surrogate key requires that integrity be separately enforced. In our example, the natural key of Order Number, Product ID will inherently ensure that a product appears in an order only once. If we use Order Key as the primary key, we have to use an integrity routine to enforce that rule. Other questions arise: Should the affected table include both the surrogate key and the natural key, thereby increasing storage requirements? If you remove the natural keys, do you add a mapping table? How much more complex is it if you query by the natural key but join via the surrogate key?
In a data warehouse (DW) environment, the use of surrogate keys is not a simple decision. It affects the full spectrum of DW processes, namely, the gather, store and deliver processes. The gather process is affected because the keys must be generated and controlled. The store process is affected because the DB must store the surrogate keys and then ensure the integrity of the data via integrity modules. The deliver process is affected because the reporting and analysis must recognize the occurrence of histories and know how to use them.
In summary, logical data models use natural compound keys to help reflect the identity of the data and to ensure integrity. The model itself and its keys inherently ensure this. In a physical data model, this is often replaced with a surrogate key for various reasons.
The main reasons to use surrogate keys are to:
- Save storage by replacing a compound natural key,
- Insulate the table from operational vagaries and
- Allow infinite granularity.
Pros and Cons of Surrogate Keys
Pros:
- Reduced space requirements by replacing a large combined primary key either in parent or child entities.
- Isolation of the data from operational changes.
- Ability to have multiple instances against a given entity.
- Speed of access when the optimizer uses a simple, numeric index.
Cons:
- Possible extra storage requirements due to an extra column (if both natural and surrogate keys are retained).
- Integrity must be enforced some other way.
- Still need access to the data using natural keys.
- May have to supplement surrogate keys with a mapping table correlating surrogate key and its equivalent natural key.
In the analytical environment, it is quite common that the natural keys of facts and of historical dimensions become very large. Surrogate keys can be used to address this. The surrogate key allows the natural key, consisting of many parts, to be replaced by a single identifier. This identifier could be used as a foreign key in other relationships. For example, say we need a cost of goods sold (COGS), and that COGS is determined by Product ID, Location ID and Manufacture Method, and that it varies over time. The natural key is a combination of these attributes: Product ID, Location ID, Manufacture Method Code, Date. We keep these attributes in the table as non-primary-key-foreign keys, but use the surrogate key as the table key, and thereby as a foreign key in other relationships. In Figure 12, the model on the left shows the extra space consumed by natural keys; the model on the right shows how less space is consumed by the surrogate key.










Be the first to comment on this post using the section below.