Editor’s Note: DMReview.com is proud to introduce a new columnist to its online columnist lineup. Ron Forino has been in the industry for more than 20 years as a programmer, analyst, architect, project manager and director. Currently, Forino is a director of business intelligence/CRM for DMR Consulting where he has developed a series of data quality service offerings. Each month he will offer his insight into the latest data quality issues in his column "Data e.Quality."

Prelude

Len, a new programmer recently trained in the use of a leading extract-transform-load (ETL) tool sat in front of his PC gazing off into the distance. His thoughts were of the training he took the previous month and of the unexplained error messages now on his screen, "When the instructor took us through this part of the course, we never had a problem. I don’t know what to make of it. Let me check my code one more time. No, that look’s OK. How about the input file layout? No, that’s OK. How about the target database schema? No, that’s OK too. What can be causing this error?"

The morning was slipping away. After checking and rechecking every part of his code, he thought, "Should I call the vendor’s help desk? No, first I’ll ask Jane, our team leader." He took a printout of his code and walked two cubicles down the aisle. "It looks like several hundred records of sample data were processed correctly," Jane said. "But, it’s getting hung here – ah, here’s the problem. That record has no data in the customer ID field. But, we have it defined as a required field. Didn’t our users say that it is always populated?"

Who Needs to be Interested in Data Quality?

This has been an especially busy month for me as I have been involved in several information system (IS) projects including initiatives to implement customer relationship management (CRM), business intelligence (BI) and enterprise resource planning (ERP). I find it interesting that, in separate meetings with these teams, much of the discussion revolved around data quality tasks and activities. But, in each case, their initiatives in no way resembled the others. On the CRM project, our client’s customer data must be integrated from several disparate data sources. They will rely on a data cleansing tool to clean the name and address data to consolidate into a common file. As part of the ERP project, a team of data analysts is deeply entrenched in a data survey to discover where (and how clean) the source data will be for their Oracle Financials implementation. Finally, the technical staff of the BI project is in the process of utilizing an ETL tool to migrate data each night from an AS/400 to an HP Oracle data warehouse. Their current challenge is to know what to do when the data does not follow the rules they expected.

The discipline of data quality is far-reaching. I recently attended a GartnerGroup Symposium where I made a point to attend discussions covering all of the hot topics they project for IS during the next five years: BI, CRM, ERP, SCM (supply chain, management), e-business and more. I wasn’t particularly looking for it, but there it was – each and every area included data quality (or data integrity) as an integral factor in a successful implementation.

In fact, every research institution survey I’ve read about in the last six to seven years all agree that data quality issues are one of the leading causes of project delays, budget overruns and outright failure for companies who have or are implementing data migration projects. Who needs to be interested in data quality? The answer is all of us. Whether we know it or not, data quality tasks and activities are being performed by our development and implementation teams right now – regardless of whether we put it into our project plans or not. Perhaps we should be asking: Are we performing these tasks at the right time, in the right way and with the right people to assure the success that we need? Are we providing them with enough time and information to effectively perform the job?

Who’s Doing Data Quality?

Data quality means so many things to so many people. Speak with a vendor of data cleansing tools and data quality quickly relates to name and address data and their tool’s ability to standardize, verify and match customer records. To a programmer, data quality is about putting the proper edits into the forms and windows they write to accept an application’s data. Speak with a practitioner of quality assurance or system testing, and data quality is a product that tests an application’s programs and processes. Speak with a data modeler and you will likely have a lengthy discussion about data stewardship, models and meta data. To a DBA, data quality is about what they do when a database fails referential integrity or their bulk-load process. To an application’s architect, designer or project manager, data quality is knowing how to eliminate data that will cause their application systems to fail (and subsequently, to work with the users to clean and reenter erroneous data).

A data quality initiative can include any or all of the activities I’ve just listed. I believe, however, that data quality efforts fall into two classes – project data quality and enterprise data quality.

Project Data Quality

Project data quality efforts are project related and associated with the delivery of a single application system. These efforts consist of the tasks and activities needed to assure acceptable data for the users of a single group or organization’s application system. These initiatives can be grouped into five broad categories including rule discovery, compliance measurement, analyze and certification, quality improvement and meta data creation.

Enterprise Data Quality

Enterprise data quality efforts, on the other hand, have an interdepartmental or enterprise perspective. They include activities requiring different groups within the organization to collaborate in order to improve data quality. Such initiatives can include corporate, top-down initiatives focused on the general improvement of critical data and business processes. Enterprise data quality initiatives can be grouped into seven categories including collaborative improvement, data stewardship, discovery of downstream data requirements, changing corporate culture, implementing consistent process auditing and/or information chains and, finally, defect prevention.

In future columns, we will discuss these services further as well as how the organizational maturity level for data quality can be determined by the type of initiatives your organization implements.

Our Goal: Customer Satisfaction

The management consultants in my organization use a methodology known as benefits realization. To borrow from Stephen Covey’s book, The Seven Habits of Highly Effective People, benefits realization is a process of "beginning with the end in mind." Thus, the process of benefits realization begins with a step called benefits identification. Likewise, we must continually ask ourselves: What is our vision? What kind of data quality (and meta data quality) do we seek in the applications we implement? What level of data quality do my users need?

It is naive to answer: I want all of my data completely clean. What does that mean? How will you know when it is clean? I had the opportunity last year to implement a data quality program for a financial institution. We found ourselves asking the same questions at the start of the engagement. To answer the question, we used a tool that is somewhat out of vogue these days; we developed a customer-supplier model. In the process we were forced to identify exactly who our users were, what data was most interesting to them and for what data they had little tolerance for error. A very interesting thing resulted during the life of the project – the users became active participants in the data quality assessment process. More important to us, they were in a position to have data anomalies enacted upon – and they did. The point is that our goal should be to meet the quality levels needed by our data users.

In the months to come, it is my intention to describe some of the methods, tools and techniques project managers, architects, designers and practitioners can use to implement data quality projects and programs. It is also my intent to sensitize enterprise decision- makers to the value of their data and the advantages of keeping this asset in good condition. I believe data quality goes hand-in-hand with good meta data, so I will explain how meta data should be included in the data quality process. Finally, I will use examples of real data quality anomalies (and solutions) whenever it’s appropriate. So, I invite you – the reader – to share any interesting or funny data quality anomalies you have come across during your IS experience.

Epilogue

Len, our hypothetical junior programmer may have just learned his first lesson in data quality. Yes, his users told him that all of the needed data would be in the file. He knew that the operational system the data came from reported no problems with the data. The source-to- target specification even said simply to perform a "move" from source to target because there will be data in the customer-ID field. But life in the world of IS and data management is not quite that simple.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access