SEP 19, 2008 11:53am ET

Related Links

SAP Visualizes Next Steps with Analytics, HANA
May 16, 2012
ACORD, XBRL Seek Business Data Standards
May 16, 2012
Information Management Value Doesn’t Match Initiatives
May 9, 2012

Web Seminars

Achieving Real-Time Agility with Operational Warehousing
June 21, 2012
Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand

Real Time Data Warehousing, Part 3

Print
Reprints
Email

The previous article addressed the question “How is real time achieved?” Beginning with the architecture, it provided a general design for a real-time application. This third and final article of the series addresses the questions “What causes the biggest headaches?” and “What can do you do about it?”

 

It can be frustrating following the implementation of an application that delivers real-time data when the source system refuses to cooperate. This article addresses some of the more common lapses in data quality that can shoot a real-time application in the foot.

 

Incomplete Within Itself

 

Problem: Data extracted from an operational source system may lack internal elements that are required to consider the data complete. For example, a record may lack:

 

  • Person’s name,
  • Part description and/or
  • Creation date.

The business meaning of that record may be compromised as a result.

 

Solution: This is a data quality problem, and it requires a data quality solution. Typically, if the meaning of a record is significantly compromised or rendered unusable, the occurrence of poor data quality is reported and the record rejected (hard reject). However, if some of the record’s business meaning can be salvaged by filling in the gaps with default values, then the record is allowed to pass to the data warehouse with default values inserted into the gaps. In this case, the occurrence of poor data quality is reported, including the data quality exception encountered and the action that was taken (soft reject).

 

Incomplete Beyond Itself

 

Problem: Data extracted from an operational source system may only have a business meaning in the context of other data records. If those other records do not arrive the business meaning can be compromised or removed completely. For example, a record may lack:

 

  • A header record to provide the context for a detail line-item record,
  • An original transaction that is updated by the data in a subsequent record and/or
  • A dimensional data value that will join to the fact data in a record.

As a result, the business context of that record may be compromised.

 

Solution: This is also a data quality problem, and it requires a data quality solution. Again, if the context of a record is compromised or rendered unusable, the occurrence is reported and the record is rejected (hard reject). However, the context of the record can be salvaged by filling in the gaps with default values, the record is allowed to pass to the data warehouse with default values. The occurrence of poor data quality and soft reject are reported in this circumstance as well.

 

Repeat Data

 

Problem: Sometimes a source system will repeat its data. A receiving agent or application may request that data be “played” again. An interruption in an asynchronous environment may cause data to be repeated when connectivity is restored.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.