Continue in 2 seconds

Is EII Virtual Data Warehousing Revisited?

  • February 01 2004, 1:00am EST

Data warehousing has frequently been the subject of heated debate. Some controversies appear and quickly fade, while others continue to dominate press articles and industry conferences. Examples in this latter category include the topics of data marts and multidimensional design.

Recently, a new controversy has erupted over the subject of enterprise information integration (EII). Some people (particularly vendor marketing and salespeople) are arguing that EII can eliminate the need to build a data warehouse. Data warehousing purists, on the other hand, are saying that EII is just another form of virtual data warehousing, which has been proved in the past to be a failure.

To understand the pros and cons of this debate, we first need to categorize the different types of integration being used by organizations, and then identify how EII and data warehousing fit into and support this integration taxonomy.

There are four broad categories of integration used in information technology (IT) systems: user interface, business process, application and data. Many products fit neatly into one of these categories. However, as we will see, there is a trend in the industry toward products supporting multiple integration technologies. As a result, the dividing line between these four types of integration is sometimes a little fuzzy.

User interface integration provides a single view of operational and decision support data and applications at the presentation logic layer of an IT system. An enterprise portal is an example of a product that supports user interface integration. A key issue with integration at the user interface level is that although the user is given a single view of multiple disparate systems applications, this view highlights the lack of data and application integration between those systems. This is why some portal vendors are now adding the ability to construct composite applications that add a business semantic layer between the user interface and back-end corporate systems. This semantic layer adds a basic form of business process integration.

Business process integration enables developers to separate application design from application deployment. Business process design tools allow developers to analyze and model business processes. Business process automation and integration tools then implement these process models using underlying application integration technologies. The key benefit here is that the design process is isolated from the physical implementation by the business semantic layer built into the process models.

It is also important to point out that business process automation tools not only manage the implementation of distinct applications, but also monitor the flow of information between those applications. Many business process tools are adding monitoring capabilities into this process flow for analyzing business performance. This is a form of business activity monitoring, or BAM.

Application integration technology supports the flow of business transactions between application systems that may reside within or outside of an organization. The trend of the industry here is toward a service-oriented architecture that employs XML- (extensible markup language) based Web services for defining and moving business transactions across systems. If the systems involved share a common definition (i.e., a common business meta model) for the transactions that flow between them, then little or no information transformation is required. If a common definition does not exist, then the application integration software must transform the information to match the different business meta models of the applications involved.

Although application integration technology was initially designed for moving business transactions between systems, it is now also being used to transfer data between applications. In the data warehousing world, for example, many extract, transform and load (ETL) tools work with application integration software to extract data from an application workflow, and transform and load it into a data warehouse. To highlight this trend, many ETL tool vendors now market their products as a data integration platform.

Before we move on to discuss data integration, it is important to highlight some key aspects of the discussion thus far. The first thing to note is that user interface, business process and application integration technologies are being used not only for operational processing, but also for decision support processing. Another is that the three types of integration can interact with each other and can be used together. This is why the marketplace is moving toward portal, business process and application integration software being bundled into an application server suite platform, such as IBM's WebSphere. A final thing to note about the three integration technologies is that business-level meta data and meta models play a key role in the integration process.

Several different technologies can be used for data integration, including data replication and transformation servers, ETL tools and, now, EII middleware. The technology used depends on several factors such as the type of application processing, the data volumes involved, data currency requirements and the amount of data transformation needed.

Traditionally, from a processing perspective, data integration technologies have been separated into those techniques used for operational processing and those used for decision support processing. For operational processing, data replication and data transformation servers are often used (instead of application integration) where large amounts of data need to be copied (and possibly transformed) in batch mode between different applications. In some cases, data replication is used to trickle feed data changes between systems. The focus in these cases is on performance and transformation power, and little attention is given to the business semantics and processes involved (i.e., business meta data and process models rarely play a role with this type of processing).

For decision support processing, ETL tools dominate the marketplace for extracting and transforming operational data for loading into a data warehouse. Data warehouse data is typically used for strategic and tactical reporting and analysis. More recently, performance management tools extend this processing to scorecarding, which compares the business intelligence produced by decision support processing to actual business plans and goals, and informs the appropriate business user when out-of-line situations occur.

When building a data warehouse with ETL tools, little attention is given to the business semantics and processes involved. Outbound from the warehouse, some analysis and performance management tools employ a business semantic or meta data layer to isolate analyses and reports from the data warehouse structures being used, but it is still the duty of the business user to map the results back to the business processes involved.

There is increasing need in organizations for solutions that can exploit decision support processing for day-to-day business decision making (i.e., for operational reporting and analysis). Operational decision support processing is about making organizations more responsive. This may involve the querying and analysis of current or low-latency operational data, business intelligence-driven alerts and automated decision making, or rules-driven recommendations and predictive analysis.

From a data perspective, operational decision support processing may be performed directly against operational data or using a low- latency data store such as an operational data store (ODS). Operational data can be used directly when limited data transformation is required and when data query and analysis volumes and complexity are low. When significant data transformation and analysis is required and where some level of data latency can be tolerated, the use of a low-latency store is the better approach.

The use of a low-latency store causes considerable confusion because there are multiple uses of such a store. In all cases, however, the motivation for creating a low-latency store is to integrate and clean operational data. If the operational source data was integrated and consistent, then such a store would not be required. This is the same with data warehousing; if source systems maintained integrated, consistent current and historical data, then a data warehouse would not be required.

A low-latency data store has several uses. Some of these are associated with operational processing and some are related to decision support. In reality, however, the distinction between these two types of processing is often fuzzy and overlapping, often only relevant from a political and organizational perspective.

In operational processing, a low-latency data store can be used for integrating operational and master data. This data can be used as a base for new operational applications, for the staged migration of older legacy applications and for propagating data to downstream applications. For decision support processing, a low-latency store can be used for staging data into a data warehouse and for reporting and analysis.

The controversy surrounding a low-latency store concerns whether such a store can be justified purely for decision support processing (i.e., operational reporting and analysis). In many applications, the business benefits obtained can justify the building of the store; in other cases, they cannot. This is why some organizations are looking for ways of doing operational reporting and analysis without needing to build a low-latency store. This is why BAM solutions from application integration and independent BAM vendors are attractive. These solutions can report on and analyze business processes in memory during operational processing without the need to create a separate data store. This type of BAM software is also attractive because, unlike many data integration techniques, it is event-driven and tied to a business process. Note, however, that BAM is not suitable for tactical or strategic decision support.

Another approach that supports operational reporting and analysis without the need to create a separate data store is enterprise information integration. This technology provides a federated query server that can retrieve and integrate data from several non-integrated data sources. Such a server is used primarily for querying and accessing structured operational data. Some products, however, also support unstructured data stores. The results of a federated query can be processed by operational applications or by decision support query and analysis tools. It could also be used to feed an ETL tool for building a data warehouse. Given that most EII products are not event- driven, they are not suitable for building a low-latency data store where the latency must be as close to real-time as possible.

When reviewing EII products, several questions arise. A key question is: How is EII different from the traditional distributed query processing of relational database management system (DBMS) products? The answer is that EII is an evolution of the facilities provided by these products. The emphasis of EII is on optimizing access to heterogeneous data and on providing a single view of this data. Products, however, vary in their capabilities. IBM's DB2 Information Integrator, for example, places strong emphasis on the ability to access unstructured data (it also provides a data replication capability). Other non- DBMS products, such as MetaMatrix and Certive, for example, emphasize meta data and analysis power. Application integration vendors such as BEA with Liquid Data are also targeting the EII space.

At the heart of the EII controversy is whether EII negates the need for a data warehouse. In other words, can EII be used for virtual data warehousing? Although EII provides capabilities over and beyond those supplied by traditional relational DBMS products (which have been used for virtual data warehousing in the past), it still cannot solve the data quality problems that exist in many operational systems. Adding EII on the front of operational data is like adding a portal on the front of multiple corporate systems ­– the technology may provide a single access point to disparate information, but it cannot hide the integration problems that may exist. EII is therefore suitable for accessing live operational data when zero data latency is required and where significant data integration problems do not exist (i.e., where complex data transformation and analysis are not required).

One final point that should be made is that EII is still a data-driven technology. Like most data integration technologies, EII suffers from the complete disregard for business semantics and business processes. To succeed, EII vendors need to place as much emphasis on solving meta data issues as they do on promoting how many data sources they can support. If they don't solve this issue, then it will be difficult to integrate this technology into the overall enterprise integration stack.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access