Could you please give me an idea of the typical range on a per record basis for a deduplication (data cleansing) project?


Sid Adelman’s Answer: There is no meaningful range. It will depend on:

  1. How many source files you are consolidating for the deduping.
  2. If data entry was careful and motivated to minimize duplicate names. It’s often easier for them to just enter a duplicate name than to identify and use a customer number that already exists.
  3. The editing in the entry system to spot and alert duplications

Chuck Kelley’s Answer: A typical range of what? Cost? Records Combined? Each project will be different. I have seen as much as 50:1 and as little as 1.1:1 in terms of combining records. There will need to be some analysis done on how bad the data is and how much is it worth to have a "perfectly" cleansed environment. You can do a 75 percent cleansing of name and address rather inexpensively. To do a "perfect" cleans can cost another 50 times as much. How much is it worth to your business?

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access