Business intelligence systems such as data warehouses and data marts are evolving into business critical decision support systems. Data warehouses that once were updated monthly or weekly now require nightly updates. And, some data warehouses are actually updated multiple times a day. In most cases, the data warehouse update is accomplished through a batch process referred to as ETML:

Extract volumes of data from operational databases such as IMS, Fast Path, CICS VSAM, batch VSAM, DB2, IDMS, ADABAS and DATACOM DB.

Transform the operational data types and data structures into useful information constructs for query and analysis. This transformation effort requires significant processing to convert legacy data types and data structures into relational formats and to merge, purge, join, summarize and aggregate data records.

Move the data, either before or after the transformation processing, from the operational environment to the data warehouse environment. This often involves moving the data across a network.

Load the target database (data warehouse).

The ETML process is performed within a brief time period (batch window) with particular attention to ensure there is minimal impact to the performance or availability of operational applications and/or the availability of data warehouses. A growing concern for companies is that the volume of data moved through the ETML process grows each day.

Complicating matters further, many companies are now implementing operational data stores (ODSs), a new type of database that requires data currency within seconds or minutes of the actual operational databases. When a transaction updates an operational database, that update must be reflected in the operational data store within seconds or minutes. Of course, any legacy data types or data structures must be transformed into relational rows and tables prior to updating the operational data store. In some cases, the legacy data will be combined with relational data from ERP systems to provide a single, current and comprehensive picture in the operational data store.

It is interesting to reflect on the evolution of data warehousing. Companies began by implementing classic enterprise-wide data warehouses. Companies also developed both dependent and independent departmental data marts to take advantage of shorter implementation cycles and to obtain results and benefits more quickly. Recently, operational data stores are being added to meet demands for current, accurate data. Had this sequence of data warehouse evolution been reversed, companies would have developed a different ETML strategy from the batch processes described earlier. The requirement to maintain data currency within seconds or minutes in an operational data store requires high performance change propagation facilities to capture updates as they occur to the operational databases and propagate these changes immediately to the operational data store.

Change data propagation involves capturing changes to a file or database as part of the application change process. After an operational transaction completes, captured changes are transformed, moved and applied as updates to the target ODS database in a matter of seconds. This type of data propagation is referred to as near real time or continuous data propagation. It is the only practical approach for keeping a target DBMS synchronized within minutes or seconds of a source database. Let us quickly review some of the requirements that change propagation facilities must address to successfully support operational data stores.

Change Data Propagation Support for ODSs

Legacy Database Support: Change propagation must access the key databases used by operational applications. In addition, it is important that the change propagator be transparent and non-intrusive to existing operational applications so that no additional programming is required to implement the change propagation solution.

Performance and Efficiency: The change propagator must be able to handle the transaction volumes of operational applications. For many companies, this means capturing hundreds of updates per second. It is equally critical that capturing changes does not impact transaction response times or elapsed times of batch jobs.

Transformation and Mapping Capabilities: The change propagator must be able to process the same legacy data structures that batch-oriented ETML tools process and also data from relational ERP systems. However, recognizing that the change propagator processes a single legacy record at a time and has an objective to maintain a high throughput rate, it is reasonable to limit transformation capability to converting legacy data types and data structures into relational formats. The following transformations can reasonably be accomplished as part of a high-performance change propagation process.

  • Date and time conversions (needs to support a comprehensive set of legacy date types).
  • Data type conversions.
  • COBOL data structure conversions including REDEFINES, OCCURS, OCCURS DEPENDING ON, etc. Options should be provided to convert repeating groups and repeating elements to multiple rows in the same or different relational tables. This support should include generating unique keys for target databases as required.
  • Mapping single legacy records to multiple target rows ­ possibly located on different systems.
  • Where clauses.
  • Table lookups.
  • Concatenations and sub-strings.
  • User exits.

Transformation activities such as merge, join, summarize and aggregate usually can be performed more efficiently using the relational database capability available on the target system. However, with proper caution they can be performed as part of the change propagation. The concern is for the amount of I/O required to process each change.
Target Database Apply Options: The change propagator must provide different options for applying changes to a target database.

Immediate Apply: Captured changes are applied immediately to the target database. This option is used to synchronize operational data stores with production databases. It can also maintain staging tables that might not be used as an operational data store but that are used for loading data marts and data warehouses.

Change History Table: This option, not necessarily exclusive of the immediate apply option, maintains a history table of database changes (inserts, updates and deletes) in transaction sequence. This change history table contains information that is lost in both the operational databases and the operational data store. In fact, unless full record logging is employed in the production application, this table contains information that is not available anywhere else. This change history table provides a level of detail that can be valuable for building summaries and aggregations for data warehouses and data marts.

Integrated Bulk Data Propagation: Bulk data propagation capabilities that share the same data mappings and data transformations as the change propagation capability are needed initially to create the target tables. And, they may be needed at some time in the future to recover the tables.

Recovery Restart Capability: If a target table is recovered back to a point in time that removes applied changes, the change propagator must be able to reapply any changes that were lost due to the recovery. The same is true for point-in-time recoveries of source files; the system must be able to deal with and synchronize target tables.

Benefits and Efficiencies for Data Warehousing

Once a company decides to implement an operational data store, it is worthwhile for them to reevaluate the ETML strategies for all their business intelligence systems. There are significant benefits and efficiencies obtained if change propagation is used to maintain a staging area that contains operational data stores and change history tables to be used as a basis for refreshing data warehouses.

  1. Operational applications typically update a very small percentage (less than two percent) of their records each day. These updated records are the only ones that need to be transformed and moved to an operational data store. The result is a great reduction in mainframe processing for data transformation and data movement. In addition, there is a similar reduction in the volume of data that moves across the network each day.
  2. At any time of the day, the operational data store, a relational database with transformed and enhanced data, is much more suitable for handling ad hoc inquiries and status requests than the actual operational databases. It provides an additional advantage in that inquiries processed against the operational data store will not impact critical operational applications.
  3. At the end of day, the operational data store reflects the end-of-day status of the operational databases. It can now replace the operational databases as a data source for the nightly data warehouse update. This eliminates the mainframe processing and the batch window previously described. Since the legacy data is already in relational format, relational processing can be used to simplify aggregations, joins, merges, etc. In addition, in most cases the ODS still has to be backed up and data needs to be archived. Consequently, it will have a batch window similar to the ones for the legacy systems.
  4. At the end of day, the change history tables, which are also in relational format, contain details on every update that occurred to the operational databases. They represent an invaluable source for the nightly data warehouse update.
  5. Since both the operational data store and the change history tables are relational tables, users can write relational programs or they can employ modern data warehouse tools such as transformation engines to merge, purge, join, summarize and aggregate their data into their desired information constructs for data warehousing. This is certainly more attractive than doing this same work with COBOL structures in IMS or VSAM databases on MVS for nonrelational data sources.
  6. If the operational data store and/or the change history tables are on the same system as the data warehouse or data mart, the data warehouse update can be extremely fast and efficient with no network processing involved.

Benefits and Efficiencies for E-Business

Companies that have operational data stores in place will enjoy a tremendous advantage as they develop their e-business strategies and implement e-business solutions. E- business applications place an even greater demand on companies to make current, accurate, up-to-date information available. Actually, not just available ­ continuously available. Without an operational data store, this type of information only exists in the operational databases. We already know that it is difficult to develop new applications for these databases, difficult to access and really not suitable for ad hoc inquiry.

What strategy is recommended for implementing e-business applications?

  1. Route e- business update transactions to existing operational applications to update the operational databases.
  2. Use high-performance change data propagation to maintain a current, accurate and highly available operational data store.
  3. Use the operational data store as the primary interface for e-business inquiry and access.

With increasing volumes of data in nightly data warehouse refreshes, the requirement to support operational data stores and the demand for current accurate information by e-business applications, there is a need to reevaluate ETML strategies for business intelligence systems. If near real-time data currency is required, then change propagation is the only practical way to achieve it. Change propagation should also be more efficient for files where a small percentage of the records are updated. Additionally, transforming and moving large data volumes may impact application availability and thus again force a change propagation solution. For all other files, bulk propagation is usually easier to implement and should be the choice. Enterprises will find a combination of bulk and change propagation provides an optimum ETML solution.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access