Continue in 2 seconds

SSA-NAME3 Matches, Dedupes and Merges 40 to 50 Million Transactions per Month for Experian

  • April 01 2001, 1:00am EST
More in

REVIEWER: Ken Kauppila, vice president of AIS technology and product engineering for Experian Automotive Information Systems.

BACKGROUND: Experian is an information solutions company helping organizations use information to reach new customers and develop successful and long-lasting customer relationships. Experian developed its Vehicle Ownership Tracking 2000 (VOT 2000) product as a complete solution to provide accurate nationwide vehicle ownership information to manufacturers and their agencies. Powered by Experian's industry-leading National Vehicle Database, the world's largest relational automotive database, Experian offers a timely and cost- effective method for developing and maintaining vehicle ownership records. The National Vehicle Database contains records on 315 million automobiles.

PLATFORMS: IBM RS/6000 SP (MPP) with 34 nodes/140 CPUs; AIX 4.3.2; DB2/Universal Database EEE storing 8+ terabytes of data.

PROBLEM SOLVED: Experian Automotive Infor-mation Systems took on the task of building the world's largest relational automotive data warehouse in order to be able to help vehicle manufacturers, automotive dealers, government, tollway authorities and other auto industry businesses maintain and use current and historical records. The initial conversion process consisted of matching, deduping and merging over 18,000 files of historical data containing 2.6 billion names and addresses. The goal of this process was to identify unique individuals, organizations, addresses and vehicles. The data collected included all 51 U.S. jurisdictions, more than 170 different file formats and varying degrees of data quality. A requirement for the file processing was an industrial-strength, scalable, high-quality matching tool that could compensate for error and variation in the data while maintaining a high level of performance. Subsequently, the process developed was required to function in a steady-state environment where inbound data would be incrementally added to and updated in the National Vehicle Database. When vehicles are sold and resold, owners change addresses or names, and vehicle license information is renewed, accurate matching is essential to maintain complete and current histories of vehicle ownership. The name and address searching and matching module was the key for all the other processing modules to succeed and was one of the more complicated design opportunities of the project due to the volumes involved, variability of the data, nature of the business and technical requirements.

PRODUCT FUNCTIONALITY: Experian integrated SSA-NAME3 from Search Software America (SSA) and the Data Clustering Engine for its searching and matching capabilities, along with other products that provide address standardization, geocoding, and name and address parsing. SSA-NAME3 provides the fuzzy keys, search strategies and matching algorithms to our applications. On a dedicated system, we achieved 12 million transactions per day through our name and address processing. Currently we have over 11.5 billion rows of data in our database in a steady-state environment. We process between 40 to 50 million transactions per month.

STRENGTHS: The quality of the match results is impressive. Because of the scale of Experian's database and our business objectives, we also require top performance. We have been able to achieve our performance goals using the SSA-NAME3 keys and search strategies.

WEAKNESSES: The fine-tuning of the algorithms to obtain optimal results requires time, and the supporting documentation is quite technical.

SELECTION CRITERIA: We needed a high-quality matching tool that could work with data in any format. We chose SSA because of its reputation in this area and because of recommendations from employees within our company. We use other products for parsing and standardization, but SSA is an industry leader for match quality.

DELIVERABLES: SSA-NAME3 embedded in Experian applications provides the keys and search strategies to find the possible candidate records for a match. It then compares search and file records to return a score. SSA-NAME3 is currently being used in the searching and matching process on 11.5 billion rows of data.

VENDOR SUPPORT: The support has been very good. We built the entire process with internal staff and SSA resources. This included SSA assistance with the fine- tuning of the algorithms. SSA also assisted in high-level design of the massively parallel environment and highly scalable applications that we built. Once the algorithms were fine tuned, we have had to make minimal changes to the products. Time spent on maintenance is minimal (less than four hours per month). SSA has always been very responsive to our requests for help.

DOCUMENTATION: The documentation is comprehensive. However, it is beneficial to have SSA advice during the tuning and implementation.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access