Continue in 2 seconds

I am newly involved in the BI/BPM space.

By
  • Sid Adelman, Larissa Moss, Chuck Kelley, Les Barbusinski, Scott Howard, Adrienne Tannenbaum
Published
  • August 22 2003, 1:00am EDT

Q:

I am newly involved in the BI/BPM space. One of the things we are experiencing with our solution is that our client's data has little integrity. This is not an assignment of creating or building a DW, but rather one of analyzing and incorporating client data into our solution. When there is a lack of referential integrity, the scoping exercise may not necessarily identify the issue up front. When it does, it may cause the scope/budget of the project to go way over what the client assumed (since they are convinced their data is clean) or the issue(s) doesn't become obvious until significant time has gone by and we are down a path that ultimately feeds into the client impacting issues raised above. Is there a tool or methodology that can assist in identifying lack of referential integrity or just bad data?

A:

Sid Adelman’s Answer: The RDBMSs (DB2, Oracle, Teradata, etc.) will tell you about the lack of referential integrity. The "bad data" is more problematic. There are so many dimensions to data quality including conformance to valid values, missing data, non-unique data (when it’s supposed to be unique), missing data and inaccurate data. Take a look at the data cleansing tools for their capabilities to give you a report card of the quality of your data. You can also write SQL code to analyze the data. It’s important to inform your client of their filthy data.

Les Barbusinski’s Answer: There are several tools on the market today that analyze, identify and even correct data quality problems in data warehouse and CRM applications. However, for complex, multilevel referential integrity problems, you may want to look at Trillium. It has a pretty comprehensive library of functions and can easily be extended. Check their Web site at http://www.trilliumsoft.com/.

Larissa Moss’ Answer: This is a common problem. Most organizations have data with little integrity, and often users in those organizations do not recognize the impact of their dirty data. Larry English, who is a recognized authority on information quality provides a detailed methodology in his book, Improving Data Warehouse and Business Information Quality. Although data warehouse is in the title, the book is not data warehouse specific. There are also many data profiling and data cleansing tools on the market, such as Evoke Axio, Firstlogic, Trillium Software, just to mention a few. I recommend attending the Information Quality conferences ( http://www.infoimpact.com/index_flash.html) and visiting the vendor booths to become familiar with the capabilities of the different tools.

Adrienne Tannenbaum’s Answer: This is a common problem. We have yet to find a client that is not surprised when we analyze their data. Examples we have uncovered include assumptions that sales are tracked in U.S. dollars worldwide, when there is not field or column tracking "currency," etc. There are tools that will uncover these issues early on, and they can be "leased," but they do add money to a project up front. In general, we always spend time validating client assumptions and associated data and the start of every project before a project plan is put together – there is no other way!

Scott Howard’s Answer: There are several tools that you should consider, but they are not cheap. Cheaper, however, than the cost overruns with which you are now very experienced. Take a look at MetaRecon by Ascential Software and Aixo by Evoke Software. These are both full- featured comprehensive solutions. There are other less capable and thus cheaper solutions, but based on your experiences, I don’t recommend they be considered.

Chuck Kelley’s Answer: The issue that you are dealing with is that the business has learned to run fine regardless of how bad the data is. Therefore, is there really a data quality problem? The answer is a resounding "most likely." There are a number of tools that do this type of work. I would start my search at http://www.dmreview.com/ and click on vendor listings on the left side. Then choose the category "data quality."

Joe Oates’ Answer: Referential integrity, or the lack thereof, is just one of the issues in data quality. While it is beyond the scope of this forum to explain the different aspects of data quality the Web site http://demo.ctg.albany.edu/publications/reports/data_quality_too ls, though a bit dated as far as tools are concerned, has a good overview of the issues.

Good data quality tools are not cheap. However, I have been on projects that used Evoke and Vality and they both more than paid for their cost.

Clay Rehm’s Answer: I believe it is not a tool or methodology that is your problem. It is the presentation of this bad news to stakeholders that is the challenge. You will have to find a way to effectively present "bad news" to a group of people who are convinced there is nothing wrong with their data! Find someone you work with you has ties to these folks and leverage those relationships.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access