REVIEWER: Joe Lafeir, vice president of Product Development for RLPTechnologies.
BACKGROUND: Over the years, R.L. Polk has been the gold standard for automotive vehicle and consumer data and has continuously improved its data management methods. The time was right to move beyond continuous improvement to develop an innovative approach that would revolutionize Polk's core foundational data warehouse. RLPTechnologies, a R.L. Polk & Co subsidiary, led a re-engineering effort to implement breakthrough technologies to standardize and enhance data from disparate sources. Based on the principles of Lean Manufacturing, the Enterprise Information Factory (EIF) was developed and has revolutionized the collection, standardization and enrichment of raw data. The project served as an incubator for RLPTechnologies to offer Polk's innovative software solution commercially, allowing others to benefit from higher quality data and integrate with less cost, time and risk.
PLATFORMS: Grid computing environment running Linux Red Hat on Dell PowerEdge Intel Xeon processors.
PROBLEM SOLVED: RLPTechnologies selected DataFlux dfPower Studio to analyze, improve and control data quality within the EIF. The DataFlux Integration Server allows RLPTechnologies to expose business rules for data quality via service-oriented architecture (SOA), creating a real-time processing architecture to ensure that data entering the EIF meets corporate standards for data integrity. The project was tasked with implementing decades of unique and complex business rules used to both parse and standardize name and address data from vehicle title and vehicle registration records. Accurate and consistent handling of this data is critical for Polk to effectively correlate records across more than 240 disparate sources of data coming from data suppliers such as state governments and automobile manufacturers. The source data profiling project focused on the monitoring of a source data file from arrival to delivery to a downstream system. Data profiling was designed to catch data quality issues as soon as possible in the lifecycle of the data.
PRODUCT FUNCTIONALITY: The DataFlux tools used to perform name and address standardization included Architect and the Quality Knowledge Base (QKB) from DataFlux's dfPower Studio. The Architect tool provided a series of predefined processing nodes that enabled Polk to properly separate and subsequently parse data elements into tokens such as first, last, street and town names. Polk also tapped the power of DataFlux to perform address standardization, gender analysis and generate match codes to perform householding. The QKB provided Polk with the flexibility to create custom rules that would easily integrate with all future releases of the DataFlux software. DataFlux's powerful source data profiling engine provides all the raw data necessary for EIF Custom Profiling engine to evaluate rules, measure thresholds and alter for violations.
STRENGTHS: DataFlux's architecture and flexibility provided a means to tailor the name and address standardization solution with the complex set of business rules that needed to move from the legacy mainframe environment.
WEAKNESSES: The DataFlux solutions struggled with large-scale XML profiling and were limited to profiling files of less than 2GB in XML.
SELECTION CRITERIA: A proof of concept was performed with DataFlux and several other vendors to validate the interoperability of the solution with our target architecture and performance based on the processing of data records representing some of the more complex business requirements.
DELIVERABLES: With the help of DataFlux tools, RLPTechnologies created an engine that enhances the timeliness, accuracy and quality of data that is the foundation for Polk's analytical and operational applications. Polk's data and applications enable the automotive industry to analyze the market to make critical decisions about their businesses, communicate more effectively with prospects and customers, and evaluate the effectiveness of their programs.
VENDOR SUPPORT: DataFlux provided excellent support for our projects. The technical support has been responsive, and the online customer portal provides good information on demand. We engaged DataFlux's professional services early in the project, and they provided deep content knowledge to help get the project off the ground.
DOCUMENTATION: Documentation has been good.
DataFlux dfPower Studio and DataFlux Integration Server
940 Cary Parkway, Suite 201
Cary, NC 27513
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access