Data Integration: Still a Barrier for Most Organizations
Data integration was hot in 2005, and the intense interest in this topic continues in 2006 as companies struggle to integrate their ever-growing mountain of data. A TDWI study on data integration last November found that 69 percent of companies considered data integration issues to be a high or very high barrier to new application development.1 To solve this problem, companies are increasing their spending on data integration products.
In this article, I review key data integration trends and look at how companies are exploiting new and evolving data integration approaches to break down barriers to data integration in their organizations. The four main trends I address are an enterprise approach to data integration, integration competency centers, right-time data integration and master data management (MDM).
Enterprise Data Integration
Although data integration problems are widespread in companies, much of the focus by IT organizations to date has been on integrating structured data in support of data warehousing and business intelligence (BI) applications. The BI market has seen tremendous growth over recent years and, for many companies, BI has become mission-critical because of the important role it plays in the decision-making process. This reliance will increase as companies move toward using BI, not only for strategic and tactical decision-making, but also for driving daily and intraday business operations.
The use of data warehousing and BI has led to a much better understanding of how business data flows through the business and how it is used to make decisions. This is especially true for legacy system data, which is often poorly documented. This understanding is helping organizations deploy other data integration projects that may not be directly related to BI. The result is that more organizations are now viewing data integration as an enterprise-level problem, not just an issue to be solved when building data warehousing and BI applications. The TDWI study showed that 18 percent of companies have developed an enterprise-level architecture for supporting data integration in their organizations.
Figure 1 shows the data integration architecture I use when discussing an enterprise-wide approach for managing and deploying data integration projects. As you can see from the diagram, the architecture addresses four aspects of data integration: applications, techniques, technologies and management.
Figure 1: Data Integration Architecture (Source: BI Research)
A data integration architecture defines the underpinnings of an enterprise-wide data integration solution. It provides a starting point for each data integration project. The actual applications and technologies (and thus products) selected will be determined by the business and technical requirements of each project, but the data integration architecture provides the basis from which these selection decisions can be made.
Integration Competency Centers
As companies move toward an enterprise approach to data integration, they need to reuse and share data integration expertise and resources. This can be achieved by creating a data integration competency center. These competency centers come in many shapes and sizes. Some are funded directly from the IT budget, whereas others act like an outsourcer and charge projects for services. Some centers focus on defining a technical architecture and a preferred product set, while others also provide data integration expertise and development resources to projects.
Some competency centers evolve from an organization's data warehouse development group. In other cases, a company may merge several areas of integration expertise into a single integration competency center (ICC). An example here is the merging of application integration and data integration groups of expertise.
The use of BI to drive intraday business operations requires that BI systems become tightly integrated with operational systems. These operational BI projects often blur the dividing line between what is operational processing and what is decision processing. If the two types of processing are handled by different groups, then this can lead to ownership and technology disagreements. The solution is to pool the expertise and resources in a single ICC.
Setting up an ICC requires strong management backing and enforcement, otherwise it will fail. Gartner Research notes, "The top-performing one-third of ICCs will save an average of 30 percent in data interface development time and costs and 20 percent in maintenance costs, and achieve 25 percent reuse of integration component during 2004 through 2007. The remaining two-thirds of ICCs will fall short of those benefits because of insufficient sponsorship and other organizational execution problems."2,3
Right-Time Data Integration
As mentioned previously, one of the biggest growth areas in BI lies in its use for driving daily and intraday business decisions and operations. Operational BI often requires access to more current data than that provided by a traditional data warehousing and BI system. This requirement eventually leads to a debate about how to supply real-time data to operational BI applications.
Real time is a politically sensitive term from an IT perspective, and it is a misleading way to approach operational BI because it suggests that BI applications can access, process and deliver information in real time. For all intents and purposes, real-time delivery is not possible because there will inevitably be latency in the information delivered to the business user compared with the live data being managed by operational business applications. This position depends somewhat on what you consider to be real time and what you consider to be BI.
To many people, real time is the ability to receive information about an event occurring within a fraction of a second. Operational business applications, for example, operate in real time because they instantly supply data to business users. In actuality, these applications may take several seconds to respond, but typically this response is timely enough that users don't get impatient and start hitting the enter key on their keyboards!
A reporting application that provides data (order information, for example) about customers to a support representative in real time could be considered to be a real-time BI application. To be realistic, however, the purpose of operational BI is to analyze data and convert it into useful information that can be used to make rapid decisions. This processing involves gaining access to the required data, analyzing it and delivering the results to the business user. The user needs to analyze the information further, make a decision as to whether any action is required and then take any required action. This complete process cannot happen in real time. The latency in the process will depend on business requirements and the ability of the BI technology to support those requirements.
Leaving IT costs aside, the latency of operational BI is driven by what is the right time from a business perspective. Right-time BI is a better term to use than real-time BI. The latency of operational BI activity is, however, intraday in nature.
Operational BI has a different set of business benefits and technology requirements compared with strategic and tactical BI processing. From a business perspective, these types of applications are usually targeted at addressing specific business issues. From a technology viewpoint, it is important to build a BI and data integration architecture that can support right-time processing. Operational BI applications often start off with a relatively high latency - several hours, for example. It is important, however, to design these applications so that if latency requirements become stricter, then these requirements can be satisfied with little or no impact on the applications themselves or the underlying right-time processing infrastructure.
Master Data Management
Another important growth area is MDM, which again causes much debate from a terminology perspective. Master data is reference data that defines and supports the key business objects that underpin the main business processes of a company. Examples include customers, employees, finances, products, brands, suppliers and partners.
Master data is typically managed (and often duplicated) by multiple applications, and an important component of any MDM solution is the data integration technology used to create a single and consistent view of the master data. The issue is that data integration often becomes the sole focus of some MDM products, and as a result, MDM is often thought of as a data integration technology rather than an application solution. It is important to emphasize, however, that an organization's data integration architecture should take into account the needs of MDM applications. Right-time data integration processing is often a key requirement.
There is more to MDM than data integration. MDM solutions frequently offer collaborative, BI and workflow capabilities. They also help with the analysis and definition of the business meaning of data and the solving of semantic data differences between applications. The ability to address specific business area issues provides the major benefits of most MDM solutions. This is why many MDM vendors are now marketing vertical business-area MDM offerings. These vendors realize that generic horizontal MDM products provide limited support for handling complex business definitional issues, and it is easier for customers to build their own MDM applications on top of the existing data integration infrastructure instead.
One of the biggest markets for MDM is in the management of customer master data. The objective is to create a single view of the customer. One of the more popular acronyms for this aspect of MDM is CDI, or customer data integration. This is not only a confusing term, but it is also a bad one. The term CDI reinforces the notion that MDM is a data integration technology. As I have pointed out, MDM involves more than data integration. Customer master data management (C-MDM) would be a better term to use instead of CDI.
The concept of MDM has its origins in operational processing, where there was a need to synchronize master data between different operational systems such as front-office and back-office packaged applications. With movement toward integrating operational, BI and collaborative processing to provide a complete operational business environment, MDM has evolved to cover all three types of processing. This is somewhat similar to the way customer relationship management (CRM) has evolved over the years. Therefore, it is important to look at MDM from the perspective of these three types of processing and how they support the different MDM business requirements in the organization.
In summary, I have reviewed four main areas that are likely to be key factors in data integration projects this year. You can see that all four areas are closely related to each other. They require organizations to take an enterprise view of data integration and to build a data integration architecture similar to that shown in Figure 1.
- Colin White. "Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise." TDWI Report Series, November 2005.
- "Application Integration ESBs and B2B Evolve." Gartner Research Note, November 2004.
- John Schmidt and David Lyle. "Integration Competency Center: An Implementation Methodology." Published by Informatica in conjunction with the Integration Consortium.