Continue in 2 seconds

2007 ISA for Data Integration

  • October 23 2007, 11:28am EDT

Data integration involved the acquiring, integrating and reconciling disparate data for analytic purposes.

Solution Implementer: RLPTechnologies

Solution Provider: DataFlux Corporation

Business Pain

R.L. Polk & Co. has set the standard for automotive vehicle and consumer data for years, continually improving its data management methods along the way. Polk was looking to maintain and extend its competitive advantage amid significant industry, regulatory and technology change. Polk recognized it needed to move beyond incremental improvements and develop a new, innovative approach that would greatly enhance its existing core foundational data warehouse.

  1. Industry challenges - In the automotive industry, original equipment managers (OEMs) and dealers alike are constantly pushing for the timeliest, most complete data and analytics to compete more effectively in a flat U.S. market.
  2. Regulatory compliance - Issues facing business today regarding data privacy made it clear that a flexible and agile IT environment was required to proactively get ahead of the likelihood of stricter regulations in the future.
  3. Technology change - As an organization collecting automotive data since 1922, Polk had grown a very complex data management environment that was difficult to maintain. Emerging technologies showed promise to streamline the environment, enabling the ability to introduce new data or application offerings faster, while lowering IT total cost of ownership.

As a result, RLPTechnologies, a subsidiary of R.L. Polk & Co., was asked to lead a re-engineering effort to implement breakthrough technologies that would collect, standardize and enhance data from disparate sources and compile them into a “single source of the truth” for distribution to analytical and operational applications. The project led to the development of OneView360° as a comprehensive, integrated application for data integration.
Successful Solution

OneView360° is a fully automated data integration solution designed to capture feeds from many disparate data sources, centrally maintain and manage master reference data, incorporate multipoint data quality inspections and load data into one or many databases.

The sophisticated service orchestration engine in OneView360° optimizes data production throughput, enabling real-time data integration. With an open, service-oriented architecture (SOA), OneView360° can seamlessly integrate investments in legacy, commercial off-the-shelf applications and Web services.

Within the implementation at Polk North America, the solution must handle large-scale, complex data management needs as Polk compiles data from more than 240 different sources, representing 500 million unique vehicle transactions per year, with data on over 246 million unique households. The system has been built and was deployed in phases over a 17-month period. The data operations group at Polk uses the solution to manage this wealth of data.

Two significant projects within the re-engineering effort had a very specific data quality focus - name and address standardization and data profiling. RLPTechnologies selected DataFlux dfPower Studio to analyze, improve and control data quality with the external interface file (EIF). The DataFlux Integration Server allowed RLPTechnologies to expose business rules for data quality via SOA, creating a real-time processing architecture to ensure that data entering the EIF meets corporate standards for data integrity. The name and address project was tasked with implementing decades of unique and complex business rules used to both parse and standardize name and address data.

The second data quality-focused initiative was data profiling, which was designed to catch data quality issues as soon as possible in the lifecycle of the data. Issues monitored by this process include incomplete data files with missing or incomplete records and content and layout changes made by a data supplier without notice.

The DataFlux tools used to perform name and address standardization included Architect and the Quality Knowledge Base (QKB) from DataFlux’s dfPower Studio. The Architect tool provided a series of predefined processing nodes that enabled Polk to properly separate and subsequently parse data elements into tokens such as first, last, street and town names. Polk also tapped the power of DataFlux to perform address standardization, gender analysis and generate match codes to perform householding. The QKB provided Polk with the flexibility to create custom rules that would easily integrate with all future releases of DataFlux software. DataFlux’s powerful source data profiling engine provides all the raw data necessary for EIF Custom Profiling engine to evaluate rules, measure thresholds and alter for violations.


We believe our originality and uniqueness stem from several areas:

  • Applying lean manufacturing principles to the discipline of data management, our solution is built as a “data factory” to automate data management and database production.
  • We have built a proprietary service orchestration engine, of which we have a provisional patent, for the uniqueness and sophistication employed to improve data processing throughput in complex, large-scale data management needs.
  • Our solution provides inherent data governance and workflow capabilities - simplifying how data and business analysts can work in one application vs. many.

We have been recognized in several ways that speak to the innovativeness and originality of the solution.

  • On September 25, 2006, Polk was selected by Computerworld as a recipient of its “Best Practices in Business Intelligence” awards program in the category of “Planning, Designing and Building the BI Infrastructure.”
  • RLPTechnologies was recognized by JBoss in June, 2006 as the Innovator of the Year.
  • In October, 2006, RLPTechnologies was honored by DataFlux with its Innovation Award for exemplifying superlative vision and creativity.
  • On June 4, 2007, Polk was recognized as a Laureate by the Computerworld Honors Program for its use of OneView360° within its information technology to benefit society.
  • In the Gartner research note titled, Cross-Functional Analytics Are Key to Auto Industry Agility, of August 25, 2006 by Thilo Koslowski, Research Vice President, he notes that RLPTechnologies' OneView360° is the first offered solution that specifically addresses the needs of the automotive industry.”

Quantitative Results

Critical measures of the initiative were lower costs, efficiency and absolute quality of data. The business vision was to be 50 percent more efficient and 50 percent faster, while delivering 100 percent quality data.

The first element of the business vision in the plan was to be 50 percent more efficient. The project met this goal, with significant cost benefits realized by Polk in two core areas:

  • Leaner, better-aligned team - The renamed Data Factory team is now significantly smaller, and team members have altogether different roles and responsibilities.
  • Lower IT operating costs - Implementing the RLPT grid computing model allowed Polk to shift away from a mainframe-based system. The grid will operate with greater than 50 percent less hardware costs equaling savings of millions of dollars per year for Polk, with additional savings of 30 percent per year based on improvements made to the open systems environment.

The system has demonstrated improvements of up to 70 percent in average data-file processing speed. For example, an average state registration file that previously would have required manual processing by as many as three full-time employees and four hours of processing time now is processed in an automated fashion as it is received in approximately 23 minutes.
Polk’s revenue picture was positively impacted in terms of protecting current revenue streams and supporting additional revenue growth, allowing Polk to maximize new revenue-generating opportunities and drive double-digit growth. OneView360° can also solve challenges for other organizations faced with managing large-scale complex data - generating new revenue streams.

Qualitatuve Results

Polk’s data operations group has gained the most with the implementation of the new system and processes because it fundamentally changed the way their jobs are done. A significant amount of manual effort previously needed to be expended to process the data. In the past, the team’s day-to-day job was ensuring that files were manually received and tracked, entering the same data in multiple systems, running mainframe jobs and pushing buttons to move data from one process to the next.

With the new system, manual effort is significantly reduced so the team can refocus their efforts on building strategic relationships with data providers, ensuring higher quality data before it hits Polk’s door, analyzing trends in data and proactively analyzing data coverage and quality.

With the help of DataFlux tools, Polk has been able to shift focus from data management to product strategy and application development. OneView360°’s flexible environment allows Polk’s Product Strategy group to look for new data sources and enhance offerings while developing new analytical and operational applications that leverage more timely and complete data.

The investment has enabled a more flexible and agile IT environment to reduce product development cycles. According to Kevin Vasconi, CIO, the new infrastructure has enabled Polk to reduce the development cycle by over 50 percent in bringing its newest solution to market for lead management. This new solution will revolutionize how automobiles are bought and sold, with significant improvements in the customer experience. Consumers will receive more timely and relevant attention, as manufacturers and dealers better understand and predict buying behaviors and intentions.


The project vision was described in the charter approved by Polk’s Board of Directors as follows:

“The vision is nothing short of revolutionizing the way data is collected, standardized, enhanced and compiled into data warehouses…The solution will be designed to incorporate a high level of quality automation and statistical trending to detect, and potentially predict, data quality issues…This effort should produce a system that utilizes superior technologies and methods to produce superior results and profitability. It is not an exercise in continuous improvement, but a journey of discovery and innovation.”

In essence, RLPTechnologies was given the rare opportunity to architect the solution from a clean sheet of paper, without concern for the technology constraints of existing platforms. This allowed the team to develop the solution founded on a SOA. As a new IT architectural paradigm, SOA provides significant benefits relative to protecting legacy investments, reducing costs and providing accelerated time to development. The SOA design was an essential component that allowed seamless integration with various commercial of-the-shelf (COTS) products used for data enhancement and the company’s legacy systems by wrapping all as Web services.

The use of sophisticated service orchestration techniques, master reference data management and a standardized XML master tag library allows for the easy integration of new data and new data enhancement services without disruption to current processing.

Armed with this flexible and agile IT environment, Polk is in a position of strength to leverage future technology advancements in its “data factory” and manage ever-changing business requirements.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access