If you're a business intelligence developer, you've likely had a customer ask why there are null values in their report -- and what the values mean.
Unknown values in a data warehouse solution are common. A number of reasons can cause of them. Sometimes developers fix or hide these unknowns, but sometimes developers decide it’s okay to just leave them alone. It all depends if the developer understands what's triggering this null value.
With that as a backdrop, it is important that the developer consider a few rudimentary questions before making a decision on the next course of action.
Forget about the design, the first action item is to always check your data quality. Often, the source data is invalid for the field type or there is a domain constraint that we didn’t capture correctly. These are easy mistakes to make, but it takes time and effort to discover since it’s a very detailed process. It’s best to begin the data quality audit by checking to see if there’s a violation on the data selection rules.
Is it a Dimension Attribute?
In an OLTP relational database, it’s important to keep null values in place to ensure data integrity and the “real truth” of the data. However, unlike a transactional system, a data warehouse has more complex situations. For instance, in scenario one, a null value attribute could reside in a dimension table. This could occur because certain attributes may not be applied to all the dimension records.
For example, one can have an empty value in state if he’s not a U.S.-based customer. On the other hand, the data would not be available if there is a delay of entry in the source system. For either situation, we can create a custom transformation to capture the null values instead of just leaving a blank field there. A simple way is to insert a row with -1 as key and “Not Applicable” as the description. We can also always assign any custom value to represent a missing piece of data.
Is it a Fact Measure?
A null fact measure is very similar to a null dimension attribute. The reason for this could be a non-existing data or a delayed data input. When we consider the interpretation of fact measures, it’s actually fine to treat a blank value just as zero. When it comes to cube aggregation and report development, it won’t be a big challenge to display the null values as 0 or 0.00. That said, a developer can simply leave the null value as is and change the format strings in the following database tools.
Is it a Fact Surrogate Key?
This is probably one of the most important ETL steps a developer should address. In theory, any surrogate key should be assigned a value to avoid the violation of referential integrity. There are plenty of reasons for a null value in a surrogate key field. For example, the join could be incorrect, the fact row may not be applicable to the certain dimension, or the key might not exist in the source system. Thus, we need to maintain a consistent method to deal with empty surrogate keys. For anything that is not applicable or not related to the fact table, we can set the key as -1,-2, or -3 for its particular situation. From there, we can insert the keys and corresponding descriptions into the dimension tables.
For a transaction that happens before its dimension attribute gets created, we can customize and insert the dimension attributes from the fact table into the dimension table. With this example, this would apply to the scenario when a marketing custom item is sold in a special business unit and the item could not be found in the item dimension table yet.
No matter what action you take, we have learned from the field that the key step is to stay consistent in your data warehouse development. Inconsistency can definitely create confusion to business users when they see three rows each with a blank, “N/A,” or “Not Applicable” in their pivot table, even if they have the same meaning. A developer’s job is to treat null values based on the identified cause and implement the solution consistently throughout the system.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access