The previous article addressed the question How is real time achieved? Beginning with the architecture, it provided a general design for a real-time application. This third and final article of the series addresses the questions What causes the biggest headaches? and What can do you do about it?
It can be frustrating following the implementation of an application that delivers real-time data when the source system refuses to cooperate. This article addresses some of the more common lapses in data quality that can shoot a real-time application in the foot.
Incomplete Within Itself
Problem: Data extracted from an operational source system may lack internal elements that are required to consider the data complete. For example, a record may lack:
- Persons name,
- Part description and/or
- Creation date.
The business meaning of that record may be compromised as a result.
Solution: This is a data quality problem, and it requires a data quality solution. Typically, if the meaning of a record is significantly compromised or rendered unusable, the occurrence of poor data quality is reported and the record rejected (hard reject). However, if some of the records business meaning can be salvaged by filling in the gaps with default values, then the record is allowed to pass to the data warehouse with default values inserted into the gaps. In this case, the occurrence of poor data quality is reported, including the data quality exception encountered and the action that was taken (soft reject).
Incomplete Beyond Itself
Problem: Data extracted from an operational source system may only have a business meaning in the context of other data records. If those other records do not arrive the business meaning can be compromised or removed completely. For example, a record may lack:
- A header record to provide the context for a detail line-item record,
- An original transaction that is updated by the data in a subsequent record and/or
- A dimensional data value that will join to the fact data in a record.
As a result, the business context of that record may be compromised.
Solution: This is also a data quality problem, and it requires a data quality solution. Again, if the context of a record is compromised or rendered unusable, the occurrence is reported and the record is rejected (hard reject). However, the context of the record can be salvaged by filling in the gaps with default values, the record is allowed to pass to the data warehouse with default values. The occurrence of poor data quality and soft reject are reported in this circumstance as well.
Repeat Data
Problem: Sometimes a source system will repeat its data. A receiving agent or application may request that data be played again. An interruption in an asynchronous environment may cause data to be repeated when connectivity is restored.









