Despite millions of dollars worth of investments, information within the data warehouse continues to be inaccurate, incomplete and often inconsistent with its sources. As a result, data warehouses experience low confidence and acceptance by users and consumers of downstream reports. Additionally, in many cases data warehouse projects have failed.

Organizations today face many of the following challenges related to the management and control of data warehouse information: accelerating changes in the business environment, increasing complexity of source systems and technology, and an expanding array of regulations and compliance requirements such as Sarbanes-Oxley (SOX) and numerous industry-specific regulations. The integrity of the information in most data warehouse projects today is often unclear, ill-defined and suspect. It is necessary to have definitions and standards of information integrity and deploy an effective system of automated information controls to monitor, measure and maintain information integrity throughout the data warehouse to meet business intelligence (BI), performance management, compliance and the increasingly high demand to execute at unprecedented levels of operational excellence.

Organizations are operating in complex environments. They constantly generate, use, store and exchange information and materials with customers, partners and suppliers. Enterprises are also required to exchange key performance information with regulatory agencies and shareholders. Today’s connected enterprise can only be as its information quality. Trustworthy information reduces risk and uncertainty in the decision-making process, enhances confidence and improves operational effectiveness.

Today, many companies use data warehouses to centralize and organize information from which reporting then lead to major business decisions. In fact, Information Week reports, “data warehouses are the critical underpinning of business intelligence projects. And companies have big plans for business intelligence.”1 In addition, many companies also rely on data warehouses to support their key compliance programs and numerous other information management initiatives.

Companies have spent millions of dollars to build data warehouses. A simple warehouse may merely take data from several upstream sources and consolidate it. A more complex data warehouse may aggregate and combine data from disparate systems, lines of business, business partners and more - into a single warehouse. No matter the size or complexity, the quality of the information that a data warehouse contains is critical to determine and manage.

Bad information decreases operational efficiency and leads to dire drills, rework, bad business decisions, and, in some cases, embarrassing headlines. Even worse, it has lead to material misstatements in financial filings. In fact, the Data Warehousing Institute estimates that companies lose more than $600 billion every year due to bad information.2 This includes employee time, lost customers and lost revenue. A data warehouse may fail if users do not trust the quality of the information it contains.

The effect of bad information is worse for companies that need to use information from a data warehouses to meet the requirements of SOX, SAS 70 audits and Basel II. In these cases, if a data warehouse does not provide accurate, consistent and reliable information, a company and its leaders can face fines and penalties. For these reasons and more, the integrity of information that a data warehouse contains is a critical issue for every company.

The Data Warehouse Challenge

The accuracy, completeness and timeliness of the information present in a data warehouse are often questionable. In the past, data warehouse managers have been able to provide reasonable assurance as to the quality of the data because they either had the time to inspect and correct issues, processes, and controls or those problems were sufficiently isolated to prevent wide-scale contamination (due to the loosely coupled or discontinuous and physical nature of the data in the system at that time). However, several recent trends in the business environment are dramatically limiting internal auditor’s abilities to provide assurance. The urgency of finding new approaches to this issue is increasing.

Data warehouse managers face many challenges: accelerating changes in the business environment, changing needs of the business users, increasing complexity of systems and technology and an expanding (and ever-changing) array of regulations and compliance requirements.

Accelerating Changes

Every organization faces internal and external changes. There may be various reasons for these changes - new products and markets, restructuring, personnel turnover, new information systems or changes in the regulatory environment.

Experience shows that due to changes in the organization or its external environment, controls implemented may no longer adequately address current risks.

This means having information systems in place to ensure that all the relevant information is gathered reliably and at the right time, and distributed to the right people.

Increasing Complexity

In past decades, organizations grew as a result of innovation in products, processes and systems and expansion of geographical boundaries of operations. One of the consequences of all this activity has been an enormous increase in the complexity of their businesses, which tends to increase the fixed costs of conducting their business. This complexity manifests itself in many forms, affecting everything from the day-to-day BI operations data warehouse to strategic business plans. For example, a bank uses several dozen different applications to support its deposit account process. It is impossible to manually reconcile the deposit account information stored in a data warehouse back to its source systems due to the vast volumes of transactions. The complexity within business enterprises increases with time. Adding to this complexity are several factors including: higher volumes of data, faster rates of processing, more types of data, more hardware and software platforms, and more interfaces. Every new interface, every new component and every new feature adds to the complexity and increases the potential for errors.

Regulations and Compliance Mandates

As companies comply with the reporting requirements of Sections 302 and 404 of the SOX, various SAS 70 requirements, the Patriot Act, Basel II and varied service level agreements (SLAs), data warehouse managers are challenged with documenting, testing and certifying controls that ensure the integrity of the financial and operational information present in the data warehouse. In addition, internal auditors and external auditors are faced with proving the accuracy and the completeness of the financial information present in the data warehouse. According to Internal Auditing magazine, “Rapid advances in hardware and software innovations and companies’ integration of data warehouses create a dramatic change in the technological control environment, directly affecting the financial reporting system. The advancements, in turn, require the internal auditor to adapt and develop specific auditing procedures.”3

Data Warehouse Information Integrity Issues/Requirements

Change and complexity introduce information integrity risk. Accelerating change accelerates information integrity risk. Compliance makes information integrity an imperative rather than an option.

Figure 1 depicts information movement and information integrity risk points in and around a typical data warehouse. In this diagram, information passes through multiple systems. The path of the information flow also includes other applications, user interfaces and databases such as data stores and data marts.

Most companies that create a new data warehouse exert great amounts of time and effort to identify and extract the information needed to be in their Data warehouse. Data may come from internal and external systems, in multiple formats, and/ or from multiple platforms. Despite this complexity, most companies fail to consider whether the information itself is accurate. As a result, if the source data a company loads into its data warehouse contains errors, so will the data warehouse that stores it. Based on consulting work performed by Infogix consultants with Global 2000 organizations over the past decade, we have identified three fundamental information integrity issues that continue to challenge the integrity of the data warehouses.

Integrity of input information: Several source systems feed a data warehouse. Data may come from internal and external systems, in multiple formats, from multiple platforms. Despite this complexity, most companies fail to consider (to a great extent) whether the information itself is accurate. If the feed is not examined properly before loading them into the data warehouse, the integrity of the data warehouse information system becomes questionable.

For example, a multibillion dollar manufacturing and distribution organization recently consolidated all of its worldwide systems - from A/R and payroll, to inventory and invoicing, into its new data warehouse. More than 65 feeder systems interfaced with the data warehouse. However, the company had no way to ensure that its data and files were loading into the data warehouse correctly. This often results in significant rework and research and investigation work, costing the organization several hundreds of thousands of dollars each year. Source system owners often blamed the data warehouse managers for improperly translating and loading the information. Data warehouse managers blamed the source system owners for incomplete and inaccurate information. Data warehouse projects often lose credibility because of an inability to examine the accuracy and completeness of the incoming data. Recent trends indicate that data warehouse managers focus more on scrubbing and cleaning the data rather than ensuring the accuracy and completeness of information.

Source system reconciliation and balancing: As the information traverses through the source system to a data warehouse, various intermediate processes such as transformations may degrade the integrity of the information. This problem becomes more acute when the data warehouse feeds other downstream applications. For example, a large insurance company recently deployed a data warehouse project for its financial information. General ledger (GL) information is stored into the data warehouse, and the data warehouse is used to feed business intelligence applications. Soon after the launch, business users started questioning the trustworthiness of the data warehouse information as the GL, data warehouse and BI reports were inconsistent. All respective application owners blamed one another for the issues. Reconciliation and corresponding rework efforts were estimated to cost several resource hours each time reports were run and would delay the timeliness of the BI reports.

Inability to reconcile and balance data warehouse information with upstream and downstream applications is a major challenge and often leads to failure of data warehouse projects. In order to ensure processing accuracy, data warehouse managers must reconcile and balance the information at each step in the process to source systems.

Auditability: In addition, emerging data warehouses focus on compliance and financial reporting needs. These warehouses require higher levels of accuracy because they store data that companies use to create financial reports and to ensure compliance with SOX and other regulatory requirements. Internal and external auditors often require evidence that appropriate controls are implemented and properly executing. For example, a large commercial bank stores all of its wholesale loan information in a data warehouse for Basel II reporting purposes. Its risk management and control teams required that data warehouse managers provide evidence of the controls to ensure the accuracy of the credit risk information.

Current approaches to dealing with these issues are fragmented, ad hoc, and unscientific. Large amounts of resources, hardware, software, people and time are invested for manually verifying all kinds of information. When errors are detected, the finge-pointing and blame games commence. Overcoming all of these challenges requires controls. However, manual controls are no longer a viable option for financial reporting systems with numerous risk points and massive volumes of information. To provide a proper control structure for the information in such systems, one must apply automation.

Independent Automated Information Controls for Data Warehouse

Automation of controls can apply at several levels. But the greatest value comes from automating the internal controls themselves - and the process of auditing those controls. Automated controls can reconcile and audit the data within a warehouse to increase its integrity. This provides several key benefits:

  • Assurance that controls will detect information errors early to meet audit and regulatory requirements, and prevent rework and losses.
  • Insight into micro and macro trends, and the ability to monitor and audit controls that capture and analyze data warehouse information.
  • Improvement via automated controls to decrease costs, reduce cycle times, and provide standard reports, processes and documentation.

When determining the desired level of data warehouse integrity, it is important to understand how the data warehouse is populated and the usage of the data warehouse information. This requires the creation of an information integrity framework. Figure 2 depicts information movement in and around a data warehouse.

In this diagram, information passes through multiple systems. The path of the information flow also includes other applications, user interfaces and databases such as data stores and data marts. An effective information integrity framework must include the following automated controls that can:

  • Identify errors in incoming data, and block them from entering the warehouse.
  • Balance and Reconcile information as it traverses through the multiple systems.
  • Provide evidence of the effectiveness of the deployed controls.
  • Allow auditability of data warehouse output information.

We recommend the following types of controls to assure the integrity of data warehouse information:

  • Verification controls to validate information content and format, detect duplicates, verify external files and check cross-references.
  • Summary balancing controls to verify that the summary totals from two or more sources match. These controls can also balance sources of information such as: reports, files, databases, applications, and daily, weekly or monthly processing runs. It is also recommended that point-to-point balancing between source systems, record tallies and balance totals must be done to identify issues prior to affecting the integrity of a data warehouse.
  • Source system reconciliation controls to match and validate detail-level information from multiple sources. For example: items, account numbers and balances must be reconciled at each step with their source system information.
  • Time and sequence tracking controls to track information and transactions as they flow through systems and processes. For example, as the data moves from the source to extract, transform and load (ETL) technologies, and ETL to data warehouse, it is imperative to monitor the movement, receipt and reporting of information to meet certain internal and external SLAs. For example, a large financial transaction processing company is required to send a summary report each day by 10 a.m. to one of its clients. It tracks the creation and sending of the report from the data warehouse using automated controls. Alerts are sent to the appropriate team members in the event of error or delays so that appropriate remediation can take place before customers are impacted.

The effectiveness of information controls depends on how well they follow certain basic principles:
Independence: The controls must be independent of the domain they are intended to control. This contradicts the conventional wisdom that applications and systems are more efficient and effective when they have built-in controls. It is true that integrated applications such as enterprise resource planning (ERP) systems have very effective internal controls. However, they are subject to the intrinsic risk factors like all other systems. More importantly, the interfaces between the applications are susceptible to information errors.

Verifiability: Information controls should produce audit trails that are necessary and sufficient to verify the results. This audit trail forms the basis of an evidence database that will streamline audits.

Concurrency: The control regime should operate concurrently with the controlled domain. Depending on the information process, this may mean that some controls are exercised periodically while others are operated continuously, or in real time.

Standardization: Controls that are consistent in their design, deployment and operation are more effective than a myriad of ad hoc controls.

Automation: Automated information controls are more efficient because they eliminate manual intervention and process delays. They are more effective because they produce consistent and predictable results.

Recommended Approach

Because there are a large number of information feeds in the conventional data warehouse, it is important that the data warehouse manager take a systematic and repeatable approach toward establishing an appropriate independent information control framework. Based on our expertise in providing information controls around data warehouses, we recommend the following steps:

  • Prioritize and categorize the information flows that feed the data warehouse. In addition, prioritize output information flows.
  • Assess exposures within each business process and related information flow.
  • Plan for automated information controls deployment.
  • Design appropriate controls to minimize exposure and control gap.
  • Deploy automated controls for each area of information exposure that you identify.
  • Operate controls using standard procedures to ensure that the controls are in place and they are doing what they were supposed to be.
  • Optimize controls after analyzing operating histories and micro and macro trends.

A more holistic approach needs to be taken to ensure the integrity of data warehouses and this requires ownership of the entire process not just at certain events or aspects. The information contained in a data warehouse is a strategic business asset capable of either delivering competitive advantage or high-visibility disasters. Companies that embrace the controls approaches outlined in this paper will be poised to derive real business value and competitive advantage from their data warehouse investments.
Companies must take action to ensure that their data warehouses do not contain bad information. This is especially true for data warehouses storing financial information. Adding automated controls around the processing steps upstream and downstream from a data warehouse will reduce information errors and enhance Information Integrity.


  1. Rick Whiting. "BI Spending to Increase." InformationWeek, March 2006.
  2. TDWI. "Data Quality and the Bottom Line." The Data Warehousing Institute, 2002.
  3. Brian Partrick Green, Michael Dean Harkness, Thomas G Calderon, Michael Foran. "Control Assessment in a Data Warehouse Environment." Internal Auditing, July/August 2006.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access