Since the early days of decision support system development, the latency between when transactions occur and when the data is available for reporting has been a headache for managers. Lately, new technologies such as messaging and enterprise application integration (EAI) software have provided increasingly better capabilities to build real-time data warehouses and better integrated analytics. This article looks at some of the options organizations have when attempting to increase the traditional decision support system's ability to also support timely operational reporting.
Data latency is an old reporting problem. In the 1960s and 1970s, managers had to load data from punch cards and tapes before reports could be generated. In the 1980s, specialized transaction systems and the emergence of PCs created a proliferation of spreadsheets and desktop tools that had to be periodically updated with fresh data. Some organizations were meeting the challenges of real-time operational reporting by simply allowing users to query the transaction systems. For most organizations, this was not an option. There were simply not enough system resources to support a large number of users accessing gigabytes of data. Allowing extensive querying of the systems could also delay critical processing of transactions.
In the 1990s, the enterprise resource planning (ERP) revolution reduced the number of transaction systems operated by organizations. At the same time, the evolution of better databases allowed companies to create specialized decision support systems, such as data warehouses and data marts that extracted data from a variety of sources and made it available for analytics. However, the core issue of data latency still existed, and the ultimate goal continued to be an analytical system with as near real-time access to operational data as possible. This would provide better analytics capabilities and better operational reporting.
XML and Messaging
In the late 1990s, companies explored XML as a silver bullet to get timely transactional data into data warehouses. The idea was to provide instant synchronous updates to the decision support systems as the transactions occurred. While the concept sounds simple, it has major implications. First, the transaction system has to create a fixed format document for each transaction, something that can be quite time-consuming. Second, the documents often become large due to tags and metadata embedded in each record. For example, transactions based on the proposed XML protocol - extensible messaging and presence protocol, or XMPP - carry both open and end tags for each data point. If you want to send a simple record with a first and last name, it may look like:
While the record only contains eight characters (Jim Smith), the transmitted document contains 55 characters. The overhead is even higher due to additional tags that describe data types, definitions, headers and details. Naturally, XML-based messaging protocols such as XMPP have not been used extensively by very large data warehouses that may receive millions of transactions each day. However, XML has often been the backbone of marketplaces, short message services (SMS) and custom Web applications to transfer transactions to back-end processing systems such as SAP and Oracle.
Other vendors have taken a more proprietary approach to messaging and successfully created interface standards such as electronic data interchange (EDI) or IDocs that simplify the formatting and transporting of transactional records. The reduced overhead of these formats has allowed companies to automatically send records into their newly coined real-time data warehouses.
Instant Messaging for Operational Reporting in the Data Warehouse
It is amazing what two years will do in terms of standardization. In 2003, the major contenders in the messaging standardization race were XMPP and a standard known as session initiation protocol for instant messaging and presence leveraging extensions (SIMPLE).
As mentioned, XMPP was great at handling simple records such as SMS traffic, but it had a huge overhead when transmitting large volumes of transactions (something it was never intended to do). On the other hand, a drawback of the competitor, SIMPLE, was that it provided core support for single text messaging but had little support for other formats. Therefore, each vendor had to build their own extensions, which were often incompatible. Another problem with SIMPLE was that it supported the old user data protocol (UDP) as well as transmission control protocol (TCP) in the transportation layer. Because UDP has few quality controls, data packages can be dropped and data lost with limited ability to restart or track the process. This was not good for large reporting systems that relied on timely, accurate and complete data. The first versions of SIMPLE were not extensively used for real-time data warehouses and reporting systems.
Figure 1: Using Basic XML to Push Data to a Decision Support System
Microsoft Becomes an Enterprise Application Integration Provider
While SIMPLE had some issues, it was a great platform for vendors to launch their initiatives. In 2003, Microsoft worked on a project called "Real-Time Communication Server" to enhance the SIMPLE protocols. In 2004, Microsoft launched a new version of their messaging product known as BizTalk Server 2004. It had two ambitious goals. First it aimed at providing B2B integration. Second, it aimed to become the platform of choice for EAI. This is the integration of both transactions systems and critical reporting systems within an organization.
With this latest release of BizTalk, which is based on the .NET platform, Microsoft provided a clearer alternative to the very confusing standardization race that had literally dozens of overlapping standards and approaches to EAI. The core architecture of BizTalk 2004 is a simplified server system. For a decision support system in an EAI framework, BizTalk provides the Business Activity Services (BAS) to be installed on the source system side to provide the messages. The administrator of the data warehouse can also monitor the load process from many source systems using BizTalk's business activity monitoring (BAM) tool.
BizTalk 2004 was a good next step forward in the EAI space. It was implemented extensively in the healthcare industry and has started to become more common in marketplaces and Web applications. Currently, BizTalk is being enhanced to provide better network load balancing (NLB) and an enhanced management console called MMC for remote management and configuration of multiple source systems with BAS installed. The next version of BizTalk with these features is currently in beta release and will be available in 2006.
SAP Joins the Fun
With the launch of its NetWeaver technology platform 2004, SAP placed a high emphasis on EAI. One of the core components of NetWeaver is SAP Exchange Infrastructure (XI). SAP XI provides open integration between SAP components such as Business Information Warehouse for data warehousing, Strategic Enterprise Management tool for analytics as well as a variety of components for knowledge management, customer relationship management (CRM) analytics, advanced planning and optimization (APO) and SAP's portal product and non-SAP components.
Today SAP has based its core messaging architecture on the XML standards, and the direct access architecture is based on the connectivity provided by Java (J2EE). SAP XI also support interface standards such as RosettaNet, CIDX and UCCnet. The fact that SAP made XI part of its NetWeaver product signaled to many that ERP vendors were serious about integration in not only B2B scenarios, but also were moving toward messaging as a tool for integrating analytic applications with their systems. The next release of NetWeaver, which will be available in March 2006, is expected to have even tighter integration capabilities with the new version of Business Information Warehouse, also known as BI 7.0. With this tool, SAP is taking yet another step closer to providing a true real-time data warehouse.
Can You Really Trust Messaging as a Data Provider?
Reliability has been a major issue of messaging as a critical data provider to the data warehouse. Today, there are three basic levels of reliability. First, you can get a confirmation that the records were sent. Second, you can get a confirmation that they were sent no more than once. Third, you can get a confirmation that they were sent exactly once. However, it is very difficult to restart the messages once they have been sent, even when they were never received. (How do you know that you didn't receive a transaction?)
The proponents of the XML-based tools have proposed a standard known as WS-Reliability. This is a simple way to package each record prior to sending it to the data warehouse. It also allows the systems to track the transfer and restart the process if something goes wrong. Using WS-Reliability, the records are packaged and "enveloped" based on a standard called simple object access protocol (SOAP). These envelopes can be sent at anytime, even in an asynchronous mode. Other enveloping standards such as AS2, RNIF or electronic business XML (ebXML) can also be used through extensions.
Microsoft has teamed up with IBM, BEA Systems and TIBCO for an improved reliability standard. After a workshop held in April 2005, the group announced the submission of their standard called Web services reliable messaging or "WS-RM" to the Organization for Structured Information Standards (OASIS). This standard has strong support from many software companies, as well as state entities such as the University of North Carolina and the government of the United Kingdom.
WS-RM will go a long way to settle the long-term standards battle in the EAI area. The current need for interfaces between the many existing standards as provided by Rosetta, ANSI X12 and EDIFACT may be diminished over time, and messaging may finally realize its potential to provide seamless transfer of records between transactional and analytical systems within organizations and between business partners.
Figure 2: Employing an EAI Solution to a Real-Time Data Warehouse
While waiting for the latest tools to come to market, most companies are still loading their reporting systems through asynchronous nightly loads. Even this has begun to change as some companies have increased the frequency of their data loads to multiple times each day. The driving force behind this greater frequency has been the increased simplicity of developing extraction programs that only pick up the changed records in the transaction systems. Other factors have been the database's ability to load data in parallel, the increased reliability of networks and the availability of better packaged data warehouse solutions among ERP vendors. Others are not far behind, and most ERP vendors now provide the ability to load delta records (only changed or new records) into their standard data warehouses. This allows companies to access timelier operational reporting without having to place undue stress on the transactional system.
Why Have a Physical Data Warehouse?
In the middle of the discussion around EAI and instant messaging, some have begun to question the need for separating the data warehouse into a physical structure outside the company's transaction systems. "Why not query directly off the transaction systems and just apply more hardware memory and processors?" has frequently been the argument. While traditionalists hate to admit it, this is a valid argument for smaller companies who have invested in ERP solutions and who have relative low data volumes and few analytical users.
For these smaller companies, creating reporting tables inside the transactional system with "reporting records" can create a repository of operational data that has virtually zero lag time. Such systems can also be optimized for reporting and analytics by reducing table complexity (denormalization). In these cases, the data warehouse is a virtual entity that accesses true real-time data.
However, once a company reaches a certain transaction volume or a certain number of users, these virtual data warehouses become much more difficult to manage, and the stress of querying against the transactional system is simply too high to be sustained. In the worst scenarios, the analytics interfere with the transactional system's ability to perform basic functions such as taking orders, managing inventory, shipments and production. Therefore, most midsized and large companies still have to rely on external data warehouses.
It is important to remember that as the hardware and database capabilities increase, the boundaries of the virtual data warehouses are going to increase as well. If the hardware capabilities outpace the growth in the data volumes and analytical processing, virtual data warehouses might be much more common in the future.
What is Next?
In the midst of this flurry of technology enablers and newly packaged analytical tools, organizations will continue to feel the pressure from customers, vendors and management to provide better access to operational and analytical insights. Over the next couple of years, as the reliability protocols are standardized and new tools are brought to market, IT managers should examine their traditional reluctance to blend analytical and operational reporting. The benefits of a comprehensive decision support system may be simply too numerous to ignore.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access