Continue in 2 seconds

Measure Twice ­- Cut Once, Part 2

  • May 01 1999, 1:00am EDT
More in

Claudia would like thank Lowell Fryman for his assistance in writing this month's column.

As we described last month in the first part of this series, there is a real need to measure twice before we cut the data for our decision support environment. In other words, we need to implement strategies that screen or audit the content before it gets loaded into the data warehouse and continue to monitor it as it is used. This overall process is the audits and quality control program. Part one described the overall program, its various components and the methodology used to ensure high quality data in your warehouse.

This column, part two, will describe the process measure component of the audits and quality control program ­ what should be measured, what is needed for implementation and the methodology for implementation. Quality controls will be the topic of the last part of this series.

Purpose of Process Measures

Process measures are more extensive than traditional data processing program-to-program audits. The objective is to measure and validate the assumption that all of the appropriate data was extracted from the operational systems and then transferred, integrated, cleansed, transformed, loaded and distributed without any losses. Essentially, process measures should be used to proactively account for the data and the processes extracting and integrating that data prior to anyone using it for decision-making functions.

Many organizations perform some process measures in their integration and transformation (I&T) program code today. However, the results of these measurements are produced on hard copy reports viewed by only the data warehouse administrator. Business users rarely see these reports and may not be aware of a problem occurring with the I&T process.

Business users require a level of confidence in the completeness and accuracy of the data warehouse before they will use it. They will often query the data warehouse or data mart and compare the results to numbers from operational systems reports. It is a rare event for all the numbers to be identical, for a variety of reasons ­ time of extractions, filters on the extractions, errored records encountered, etc.

If the numbers don't match, the users must have an explanation that allows them to have confidence in their analyses. Unfor-tunately, most data warehouse administrators today cannot provide the needed supporting intelligence to the business users to reconcile any discrepancies, thus calling into question the validity of the data itself. This is where the implementation of process measures is needed to explain and justify the differences.

What Should be Measured?

This list is a starter set of measurements to use in implementing process measures. These can be used to stimulate the "define" activity (see Figure 2) of the methodology. Each measurement should be imbedded in the I&T process and the results recorded in a meta data repository for each load period:

  • Count and cross-foot checks of record numbers read, dropped and output by each process,
  • Sum and cross-foot of dollar amounts read, dropped and output by each process,
  • Comparison of the record count and dollars output from the previous process to those input into the next process,
  • Comparison of the record counts and dollar amounts received from an extract file,
  • Comparison of the count and dollars loaded into the data warehouse and into the data marts,
  • Results from special integration processes such as merging customer accounts, householding and data cleansing routines.

Implement Process Measures

The programs within the I&T layer, as well as those that deliver and load data into the data marts, capture statistics about the process. It is critical that these measures be stored in your meta data repository for access by all. At a minimum, the process measures are uniquely identified by process ID and run date. This will support the automated balancing of the processing for each data acquisition cycle. (Note that we are not using the term audit, since a data warehouse is not an auditable database in the financial accounting sense nor should the data warehouse be expected to perfectly reconcile to the general ledger. However, an explanation for why it does not should be expected and obtainable.)

Once you begin to capture the process measures in your meta data, you open up a plethora of opportunities. Business users can query the meta data to determine: when new data has been loaded, how much new data has been loaded, what data has been integrated, what transformations were completed, how many errors and their types were encountered and quality for the new data.

Data warehouse administrators can query the meta data to: resolve questions concerning the growth trend for the data warehouse, determine the processing time impacted by the growth and reconcile the differences between the various databases that exist in the corporate information factory.

Figure 1 depicts the overall I&T process and the points in the process where process measures should be implemented.

**Click Here**
Figure 1: Process Measure Points

Audit and Control Methodology

Figure 2 describes the audit and control activities and their interactions with each other. These were fully discussed in last month's column. The objective of each activity is as follows:

Figure 2: Process Measurement Activities

Define ­ used to determine the parameters of the measurements for process measures and the basic processes required to be measured.

Assess ­ used to determine whether the current data warehouse architecture and processes are capable of supporting the requirements.

Measure ­ create and deploy the process programs that take the measurements needed for each data acquisition cycle. The programs supporting the process must be capable of supporting the balancing of process- to-process measurements.

Correct ­ the ability to rectify processes that fail or to change the architecture if it is found to be sub- optimal for process measures.

To develop process measures, first determine what should be measured and to what granularity (see next month's column for more on this). Next, evaluate the current architecture to assess whether process measures can be inserted into the appropriate points. You may be required to make changes to your architecture at this point. Then, build the processes to capture the measurements and store them in the meta data. The final step is to analyze the results so trends, patterns or exceptions to the I&T processing can be monitored and analyzed and corrections to the I&T process can be made.

The first step toward implementing a solid, creditable decision support environment is to implement an audit and quality control program. The program consists of two parts ­ process measure and quality measures. This article describes a starter set of process measures and the methodology for implementing them into your I&T processing layer. Next month, we will cover quality measures, how to develop them and how to implement them into your audit and control program.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access