CATEGORY: Data Integration
REVIEWER: Bill Mathe, director of data operations for InsightAmerica.
BACKGROUND: InsightAmerica is a national leader in providing sophisticated access to public and private record information using computer networks. InsightAmerica offers a suite of industry-customized, analytically enhanced solutions built on open standards hardware and software. With reliable access to hundreds of national and state-specific databases, including identity data, property, addresses, phone numbers, motor vehicle and criminal records, InsightAmerica provides the broadest access to data in both real time and batch mode.
PLATFORMS: InsightAmerica runs DataLever's client and server software on Windows 2000.
PROBLEM SOLVED: At InsightAmerica, data integration and delivery is our core business. The creation of sophisticated data models and information delivery systems, combined with a huge footprint of raw data sourcing, gives InsightAmerica the edge over our competition. InsightAmerica integrates hundreds of data feeds from government, non-government organizations and private data providers. The format, schema and quality of the various data feeds is so highly variable that we must automatically verify the quality of incoming data and then parse, cleanse, standardize and reformat it for intake into our central data management system and customer-facing Web applications. In addition to unattended conversion and quality control, we also need an interactive data manipulation system to quickly assess, visualize, report and convert new or "broken" data feeds and then integrate the results into the unattended processing. Because of the fast pace of data integration from so many sources, any manual adjustment to the process necessary when a data quality alarm is triggered must minimize disruption to the continuous data integration process.
PRODUCT FUNCTIONALITY: DataLever supports all of these tasks by providing a platform for analyzing, manipulating and integrating multiple data feeds. DataLever's visual development environment makes it simple to integrate complex processing steps into the overall data flow of the data integration process. For example, when InsightAmerica encounters misfielded information such as mixed up names, addresses, phone numbers or notes, we use DataLever's integrated textual pattern analysis to discover and report the kinds of information present and then use the results of that analysis to construct textual parsers that correctly assign data fragments to columns. Another critical aspect of DataLever is its performance. We process hundreds of millions of records, and we simply can't use a slow technology. While we could scale a slower solution, it is more cost-effective to start with a product that has great single-CPU performance, scale on modest server hardware and avoid mainframe-class expenses.
STRENGTHS: DataLever provides an intuitive and highly productive interface for analyzing data feeds and then constructing, testing and automating data integration processes. Because of this, we are able to reduce the time needed to add a data feed or rollout a new information product, reduce our implementation risk and beat our competitors to market. DataLever supports a distributed execution, central meta data model that is well suited for data intake and allows us to distribute that processing to low-cost workstations. DataLever's central repository means that even work performed in this distributed fashion can still access our central library of best practices.
WEAKNESSES: We want to be able to leverage DataLever for a real-time data append service, but this will need to wait for the next release.
SELECTION CRITERIA: The solution needed to support a distributed team dealing with a large set of changing data feeds. It needed to be very intuitive to reduce our training time and maximize productivity for our knowledge-workers while still delivering a big performance punch when we process huge databases such as national consumer demographic files. Finally, we needed to build data quality alarms into our integration process to ensure that data feed problems are automatically detected and flagged for intervention.
DELIVERABLES: The main deliverables were the DataLever client and server software, documentation, training in the use of the product, assistance setting up parsing and pattern-matching transforms and ongoing "best practices" consultation.
VENDOR SUPPORT: DataLever provided excellent support and training during the project, using a combination of on-site training/consulting, off-site development of transforms and Web-based interactive support.
DOCUMENTATION: The documentation is extensive, very useful and quite necessary due to the breadth of the tool suite. Both end-user documentation and lower-level "programming interface" documentation were provided.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access