Continue in 2 seconds

CCL Eliminates Redundant Data with DataFlux

  • July 01 2002, 1:00am EDT
More in

REVIEWER: Stephanie N. Williams, database analyst for the Center for Creative Leadership.

BACKGROUND: The Center for Creative Leadership (CCL) is an international, non-profit, educational institution devoted to leadership research and training. For more than 30 years, its mission has been to advance the understanding, practice and development of leadership for the benefit of society worldwide. CCL staff members conduct research, produce publications and provide programs and products to leaders and organizations in all sectors of society. Each year, 25,000 managers and professionals from more than 1,000 organizations attend CCL programs worldwide.

PLATFORMS: Windows 2000 Professional SP2 and Windows 2000 Advanced Server SP2.

PROBLEM SOLVED: In early 2001, CCL was constructing a data warehouse to pull data from three different transactional systems - a third-party contact database system used for managing client information and tracking; a third-party financial database system used for invoicing and financial data; and the event system, an in-house system developed internally and used to register and track registrations for our program participants. Because some clients overlapped between two or three of the systems while others existed only in one, we needed a "customer master" database that combined duplicates between systems for use in our data warehouse.

PRODUCT FUNCTIONALITY: We currently use the DataFlux dfPower Studio match module nightly as part of our customer master database rebuild. Currently, the process inserts organizational data from each transaction system into one large table. Then we use dfPower Studio to generate matchcodes based on a variety of customizable criteria. We then review the matchcodes to identify one master record to tie to the other records using a key field. The result is only one instance of each organization, providing much cleaner data in the warehouse and much easier navigation within a data mart cube. Because dfPower Studio keeps statistics of the data it changes, we continually review the data that is being entered into our transaction systems in order to prevent future data quality problems.

STRENGTHS: The flexible fuzzy logic capability used in the matchcode generation process is excellent. When using multiple fields as criteria, dfPower Studio does a fantastic job of combining records, exactly as our criteria were defined. Another main strength is its ability to generate guess builds, which is part of the dfPower Studio Analysis module. It allows us to designate fields in a table (i.e., organization name, etc.), runs through an internal database and gives us a "best guess" for a standard for the records. We can then easily modify the standard for multiple records at the same time.

WEAKNESSES: The only minor weakness that I have seen in this product is the ability to generate matchcodes from only one field and from fields that are relatively short in length. For example, trying to match on organization name alone without a street address or city will yield more "correct" matches because there is less information to differentiate between the data. This problem also occurs when the organization names are abbreviations such as "AAA" or "AAP" because they will combine if there is no street address. However, this problem can be resolved by requiring an exact match.

SELECTION CRITERIA: In the middle of building our data warehouse, we determined that we needed a tool for quick data cleanup. We searched the Web for solutions and evaluated/tested three. Based on this review, we concluded that DataFlux's dfPower Studio was the most intuitive and easy to use. We downloaded the full version online as a trial. Within the first five minutes, we were able to run matchcodes against our production databases and see the results. The other products we tested were not nearly as intuitive.

DELIVERABLES: We find that the matchcodes generated as a field in the table - the critical link in combining our duplicate data - are the most helpful of the dfPower Studio outputs. However, the reports provided are also very useful. We were able to use the matchcode reports to examine our match criteria and modify match percentages. We also use the statistics logs in the standardization module to determine whether our standards are consistent and if they are changing the data that needs to be changed.

VENDOR SUPPORT: DataFlux technical support is first class and very responsive. Their sales representatives are helpful, friendly and have a sound knowledge of their product. They also follow up with us every few months just to make sure things are going well.

DOCUMENTATION: The documentation is complete and easy to understand. It provides all the information needed to answer questions regarding product use.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access