The chief promise of business intelligence is the delivery to decision-makers the information necessary to make informed choices. The unspoken assumption in this statement is that the data from which this information is derived is correct. Unfortunately, as is frequently the case, unspoken assumptions are translated into unaddressed requirements. It is, therefore, essential that the BI project team address data quality with the same rigor and drive as they would such requirements as system up-time, system response time or network performance. Data quality must be integrated into the very DNA of the BI system.
Although a BI system can only be as successful as the confidence decision-makers have in the data, the issue of data quality extends well beyond the front-end reporting tool or back-end data warehouse to the entire information infrastructure of the organization. The irony is that because the BI system brings the issue of data quality to light it is quite often blamed for poor data quality. Prior to the data warehouse, data is viewed at the transaction level or within disparate islands of information. The errors are hidden, therefore, by lack of a larger view. In others words, the forest of errors could not be seen because of the trees of data. The aggregation of the data in the BI system amplifies the issues with the data giving the organization a view of the entire forest. The BI project manager should see this not as a problem, but rather as an opportunity to deliver greater return on the BI system by improving the data quality for all information systems.
Many studies have been performed and much has been written about the financial impact of data quality. We will not dwell on these issues here. For the purposes of this discussion, let us agree that there is a cost to poor data quality and a benefit to good. Let us also agree that these costs and benefits provide sufficient ROI to merit funding of data quality efforts. The question here is how BI project managers can integrate data quality into projects to help not only ensure the overall success of the data warehouse but also raise the quality of information throughout the organization.
Recognizing this opportunity to increase value, the BI project team during the planning phase must address data quality within their quality management plan. Although traditionally the quality management plan addresses the quality of the project deliverables, the data quality plan itself should be seen as a BI project deliverable. This plan is employed first by the project team, establishing the required organizational structures and processes. It is then delivered as part of the go-live transition to the support organization. This in turn becomes the basis for the organizations data governance system.
The data quality management plan begins with requirements. Everything begins with requirements. At a minimum, data quality requirements definitions should include the following:
1. Comprehensiveness. Missing data can be just as problematic as incorrect data, especially if the user community assumes that the data within the warehouse is complete. The user community must define what data is needed. We should also bear in mind that the BI project team cannot boil the ocean. It may not be practical in the first phase of the project to integrate all source systems; in this situation, the user community collaborates with IT to prioritize which data sources are provided in which phases. This not only ensures that user requirements are met in the correct order, but helps set customer expectations that not all data sources will be included in the first phase of the project.
In addition to addressing which source systems are included, the user community also defines which data elements are included. Is it acceptable for a customer account to be missing the Social Security number? What about a fax number? It is up to the users to tell us what they need.
2. Accuracy. The users and IT collaborate on the cleanliness of the data. Is it significant when a customers Social Security number is incorrect? What about an incorrect birthday? If multiple systems disagree, which of those systems is the trusted system? Also, what level of data accuracy is required? Of course, we are all tempted to say that we want the data to be 100 percent accurate, but the business must be cautioned that there is a cost to accuracy. In some scenarios, the incremental benefit of greater data accuracy may be outweighed by the incremental cost. If going from a 90 to a 92 percent accuracy rate doubles the cost, the business may decide that it is not worth the investment. Marketing may use the data to drive a mailing campaign where the additional two percent would not be worth the investment. Others may be using the data to drive security, where the slightest breach in security may mean the loss of millions of dollars or, worse yet, the loss of life. In either case, the additional cost is money well spent. Again, the user community provides this input.
3. Consistent. Here we are referring not to the consistency of the data itself, but the metadata. This is the definition of the various data elements and the rules around the data. What is a customer? One would assume that such a question is easy to answer, but in many organizations, different departments have different, often conflicting, responses. The chart of accounts is an area where many organizations struggle. Even when there is agreement on the structure and values in the chart of accounts, there is conflict in how it is used. IT works side by side with the users to create a data glossary in which the data elements are well defined, establishing business rules to provide for consistent usage across the organization. The data glossary not only defines the metadata of the BI system, it also documents how the data is created and used by the business, the business rules.