I'm trying to make a benchmarking about data quality software. What are the most important characteristics to compare?
Sid Adelman's Answer: Data quality encompasses many characteristics of the data, including compliance with business rules, conformance to valid values, completeness - especially for mandatory fields - timeliness and referential integrity. Data should be understandable, non-conflicting and non-redundant.
These are the starting characteristics of what needs to be evaluated for data quality:
- Data elements that do not correspond to the valid values
- Missing values in mandatory fields
- Other missing values
- Non-unique values in fields where the values should be unique
- Violations of business rules (for example, a negative number of dependents, year of birth greater than the current date)
- Invalid data types (for example, a "character" type that should be "packed decimal")
Joe Oates' Answer: The basic capabilities that the tool should include:
- Domain value checking
- Domain value checking deals with whether the values for a particular column conform to formal and/or logical value rules. Many products allow the user to specify these rules. Examples include:
- If the column contains anything other than valid values that have been predefined for the column;
- If the social security number contains all zeroes;
- If a retired person's age is 17, something is wrong.
- Data type
- Are alpha characters in a numeric field and vice versa?
- Frequency counts
- If most of a company's customers are in the United States, then most rows should contain data about customers in the U.S.
- Statistical counts
- Pattern checking
- Telephone numbers in North America should be three characters for area code, three characters for the exchange and four characters for the subscriber number.
- Interdependency between certain fields
- Postal code is dependent on country, state/province and city.
There are other things that certain tools check, but these are the basics.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access