Achieving improved financial reporting and meeting regulatory compliance standards are among the most vital concerns for organizations today. Neither of these objectives can be met without reliable information. Organizations must ensure the quality and auditability of the information underlying their financial and compliance reporting as well as other critical business intelligence applications. In this article, we discuss how the data integration discipline can support organizations in achieving information quality and auditability. First let's define what we mean by information quality and auditability.

Information quality is the measure of the information's usefulness to the business process. Because information that is suitable to one business process may be unsuitable for another, information quality spans a wide continuum, and a quality measurement is based on a fit-for-use standard. Furthermore, information is derived from data; thus, information quality is very much a function of data quality.

Information auditability is conceptually simple. Information is auditable when its accuracy can be determined by tracing it to its sources as they existed at a particular period of time.

Later in this article we'll delve into how data integration tools and methods can help organizations achieve information quality and auditability; but first, we need to understand the role information components and information representations play in quality and auditability.

Information Components

Information is a function not just of data but also of business rules and application infrastructure. (Note that some practitioners consider data and semantic context to be the two information components. We further subdivide semantic context into business rules and application infrastructure).

When a businessperson receives a piece of information from a system such as a reporting tool, that businessperson is actually looking at some data that has been transformed by some business rules and has been delivered to the user through some technology infrastructure. A change in any one of these might change the information's quality or auditability.

Therefore, data, business rules and infrastructure are critical information components. Ignoring, losing or obfuscating any of these components directly affects information quality and auditability.

In service of information quality and auditability, data integration technologies and methods must tightly couple these information components. It should be possible to navigate from an information end point (such as a customer's monthly account balance) to all data, business rules and infrastructure components involved in producing that piece of information.

Contemporary data integration technologies and methods should also allow a business user to navigate from any one component to all related components. Given a piece of data (e.g., a customer's purchase order total), it should be possible to view all business rules which involve that data element. Given a business rule, it should be possible to view all incoming and outgoing data for the rule. One should be able to see rules that relate to each other through intermediate data. Given a business rule or data element, one should be able to see what infrastructure is involved in implementing the rule or handling the data.

Thus, a business user should be able to navigate a chain of information components backward (derived information back to originating component) or forward (originating component forward to derived information) using data integration tools and techniques.

Information Representations

Both business users and technologists have interpretations of the three information components (data, rules and infrastructure). These various interpretations are often categorized as conceptual, logical or physical. Furthermore, technologists work with representations of the applications they eventually deploy into production environments. This type of technical representation (often written in a programming language) and the working system are closely related but are distinct. Therefore, we end up with four total representations of informational components. The first three are descriptive in nature (meta views), and the fourth is programmatic:

  • Conceptual meta view
  • Logical meta view
  • Technical meta view
  • Physical

The conceptual meta view is an abstraction that drives long-term architecture and planning decisions and is beyond the scope of this article. We will focus on the other three views henceforth.
For this discussion, we consider initial application development requests to be synonymous with change requests, and we think of them both as forms of change. Ideally, change flows from the business user to the technologist via the logical meta view (which includes business requirements, high-level design and so forth). The technologist responds to these changes by developing a technical solution via various representative languages and tools such as SQL, COBOL, Java, BI (business intelligence) tools, ETL (extract, transform and load) tools, services and data modeling tools. This representation is the technical meta view. Finally, the solution developed to respond to the original change request is deployed into a production environment.

Again, this is ideally the process that changes will follow. But what really happens in enterprise IT organizations? Is every production change attributable to the technical meta view that produced the change? Is every change in the technical meta view (for example, in the program code, database model or report layout) traceable to a documented change in the logical meta view (such as business requirements change request, or logical design change)? Sadly, the answer to these questions is often no.

Contemporary data integration technologies and methods recognize that the IT practitioner is primarily held responsible for maintaining linkages between information representations, and that the typical IT practitioner's workload leaves little time for managing "redundant" representations of our production systems. Clearly, most IT practitioners make reasonable efforts to correlate the technical meta view with the production application. Maintaining the linkage between the technical meta view and the logical meta view, however, is often seen as a luxury. An occasionally maintained link between the logical and technical views is practically equivalent to no link at all because practitioners lack trust in these linkages.

Therefore, contemporary data integration technologies and methods strive to facilitate the maintenance of these linkages. The holy grail for synchronized representations is that there is only one representation. If there were only one representation, and that representation could be appropriately presented to each constituency (end users, technologists, data center operators) needing to interact with the final solution, synchronization would be unnecessary.

Most attempts to develop universally suitable and comprehensible programming languages have failed miserably. Contemporary data integration technologies and methods, therefore, avoid the universality trap. Instead, the representational linkage results from unidirectional, successive refinement within an integrated framework. The business user develops the logical meta view within a data integration framework. The technologist starts with the logical view and successively refines it into a technical view, also within the same data integration framework, thus retaining the linkage between these two representations. Finally, the technologist deploys the production application also from within the same data integration framework, again retaining linkage.

A less active but still somewhat effective alternative is linkage auditing. Linkage auditing is performed periodically and tests for the likelihood that a representational link may have been broken. Linkage auditing works backward from production deployment change requests to technical and logical representations. In summary, the audit tests for reasonable numbers and types of related changes in each representational environment. Audit reports include potentially broken links. Presumably, an excessive number of broken links here would result in a failing grade on a broader information audit.

Information Auditability Techniques

So far, we have shown that information quality and auditability are functions of how well information components are linked with each other and how well the information representations are linked with each other. However, current IT data integration practices rarely maintain anything like a full component and representational linkage. To the extent that these links are maintained, the linkage quality degrades with age.

Given an informational element (e.g., consumer credit score last month) recently delivered to a business user, most IT practitioners could perform the archaeology needed to comment on that information element's accuracy. When recent information is in question, the informational components (data, business rules and infrastructure) across all informational representations (logical view, technical view, production) are hopefully readily available to answer any questions of information accuracy.

However, for an informational element delivered six months ago or two years ago, the amount of archaeology needed increases dramatically, along with the probability that some missing critical informational component will thwart the effort completely.

In this climate of increased regulatory compliance, an organization will certainly need to vouch for the accuracy of some informational element delivered to the business months or even years ago. Therefore, information components and information representation linkages must be maintained across time, and the ability to traverse these linkages across time is essential to information auditability.

Of the three information components (data, business rules and infrastructure) across the three representations (logical meta view, technical meta view and physical), historical access is practical for all components in all representations except for data's physical representation. In other words, we should be able to recover business requirements (logical meta view) for a system change implemented months and years ago; we should be able to find the corresponding software source code and physical data models (technical meta view); and we should be able to find the final deployed application code (physical). Doing the same with the actual physical data as it existed years and months ago is considered impractical and too costly unless explicitly indicated by business requirements.

Contemporary data integration tools and techniques can facilitate historical navigation of all component types across all representations except the physical representation of data. There is an exception to this exception. Reference data, because of its relatively small volume, is now being stored and managed historically by master data management products. Historical reference data availability greatly enhances information auditability because reference data tends to drive business-rule-to-business-rule linkages that would otherwise be difficult to follow absent the reference data as it existed.

Information Quality Management Techniques

Current data integration techniques consider information quality a key service-level agreement attribute. In other words, the business user not only wants information delivered on a timely basis, but also wants quantified commitments from IT that the information delivered will be of a particular quality or higher. Thus, quality metrics are needed. Quality metrics can be thought of as types of data (an information component).

Information quality metric threshold levels determine "goodness." If a particular information element (such as customer postal code) achieves a threshold level of quality (such as, 90 percent of all postal codes are consistent with the city and state values on the record), it is service level compliant. Testing for service level compliance requires that quality metrics be defined.

Metrics and thresholds should be defined to establish fit for use. Note that information usefulness metrics and thresholds can change over time.

Some quality metrics are business independent. Contemporary data integration technologies can automatically evaluate these metrics for a given data set. These general metrics fall into one of the following categories:

  • Data type
  • Data domain compliance
  • Statistical features of the data set (maximum value, minimum value, population distributions)
  • Referential relationships

Other measures of quality involve business rule compliance and cannot be automatically evaluated within a data integration technology without some customization. For example, a credit card company may establish as a metric the number of consecutive months an account has a credit balance. All accounts with more than two sequential months of negative balance are considered invalid. Data integration technologies allow metrics such as this to be programmed.
Automatically reverse engineering application source code into business rules and deriving validation rules from these business rules would be ideal, but the data integration technology is far from such an ideal.

Contemporary data integration technologies and techniques not only allow for information quality measurement but also for quality remediation. These remediation - or cleansing - rules are a sort of business rule themselves. They should be managed as yet another type of information component and should exist in each information representation mentioned earlier.

As information flows within an enterprise, it does so from some source to some target. When initially moving from an unmanaged to a managed quality world, one must deal with the reality that no information source can be trusted to comply with information quality service levels. Thus, all source information must be measured and possibly remedied. The only way to eventually reduce information remediation, much of which is probably redundant, is to certify information published to targets and to accompany published data with a quality certification.

Contemporary data integration tools and techniques support this certification notion. Note that the information quality certificate is itself a form of data, an information component.

Certification addresses not only information quality, but also information auditability. Presuming that information published in the past was accompanied by a quality certificate, one could satisfy some of the auditability criteria by studying the certification criteria and results. If the certification criteria are insufficient to support an audit or if the certification results require auditing, one can fall back on the component and representational linkages to perform the information audit.

Information quality and auditability are keys to enabling improved financial reporting, meeting regulatory compliance objectives and ensuring the success of any critical business intelligence application. Contemporary data integration tools and methods can help organizations achieve information quality and auditability by facilitating linkages among information components and representations over time, as well as evaluating, remediating and certifying information quality.


Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access