One of the biggest problems facing enterprises today is the question of integration of application systems and databases within and across enterprises. These may be legacy databases and systems that were developed years ago for a specific purpose and are still being used very effectively in the enterprise. Or they may be recently developed but difficult to integrate with other data- bases and systems that contain much the same data. These may all be redundant versions of the same data, each of which must be kept up to date with any changes so that all data versions are current.

In this and following columns, we will discuss various technologies, including XML, that assist with enterprise application integration (EAI).

The first approach uses data content analysis for normalization of live databases or files to reverse engineer third normal form (3NF) data structures and database designs directly from the live data content. This can permit EAI to be achieved more effectively than by using the unnormalized databases and files. The second approach is based on XML which is used to expose all aspects of databases, including business rules. This is called inter-enterprise data integration. The third approach analyzes the implicit relationships between tables that reside in databases developed by the enterprise, as well as databases developed by enterprise resource planning (ERP) vendors such as SAP, Baan and others. This is called hyperrelational analysis. We will start with data content analysis.

Data Content Analysis

We have all encountered legacy databases and application systems that were developed many years ago, but the database design and application design were never documented. Other databases were originally documented, but changes have since been made to the applications or the databases – and those changes were never updated in the documentation. As a result, little is known today of the database structures.

Of course, it is possible to reverse engineer these undocumented legacy databases to determine their structure by using CASE modeling tools. These extract from the database catalog various details about the tables and columns that comprise a database. With this knowledge of the database structure, legacy database designs can be integrated with other databases. They can then be reengineered for new database environments. The problem becomes more complex, however, when it is necessary to reengineer databases that were unnormalized for performance.

You know the problem. Many legacy databases did not store details about customers or orders or products only in the relevant customer, order or product tables as normalized data. Instead, these details were combined in common tables as unnormalized data, hoping to avoid perceived performance problems. This may indeed have enabled improved database performance, but it was often achieved at the expense of creating redundant data versions throughout the enterprise. The problem emerges when redundant data changes. For example, if a customer's address is changed or a product price is changed, each redundant data version has to be updated so that all versions reflect the same status of the data.

EAI brings all of these redundant data versions together, so that relevant customer, product or other details exist in only one place – yet can be shared throughout the enterprise. When a change occurs, the change only needs to be made once. The single, updated data version is immediately available at its latest status for everyone who is authorized to use it.

Consider when these problems occur together: unnormalized data versions that exist redundantly throughout the enterprise plus an absence of documentation of those unnormalized database designs. To resolve this problem requires an enormous expenditure of effort. Examining the database catalogs and the live data content – to infer data dependencies and to derive normalized database designs for EAI – is largely a manual task.

Fortunately, new technologies based on the application of data content analysis are emerging to assist this analysis. Products such as Axio from Evoke Software analyze live databases to infer data dependencies. All of the data values in a column are first analyzed for data value consistency and data quality. For example, the same address column may have some rows that seem to be different – appearing as 100 Fillmore and also as 100 Fillmore Street. When quality problems are detected, these different values can be changed so that only consistent data values exist (using only 100 Fillmore Street, for example).

Many products are available to assist this data quality analysis. However, Evoke Axio takes this analysis further. It also examines the data values in each row of a table to identify columns that are dependent on the values of other columns in the same row. This dependency analysis of data values identifies possible primary and foreign keys. It enables those columns to be normalized to 3NF. It eliminates data redundancy by deriving 3NF database designs and 3NF data models, working from the live data content of the database. The end result is the automatic generation of 3NF data definition language (DDL) schema scripts to install the 3NF databases using appropriate database management systems (DBMS) products. It resurrects undocumented legacy databases that may have been lost forever and, in turn, enables more accurate EAI.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access