The modern computing era has been characterized by the explosion of systems, applications and databases.1 In most enterprises, this has resulted in architectures that are characterized by redundancy at the business process, application, and data levels. As a result, resources are not used as efficiently as they could be, and it is often the case that there is no single comprehensive, clear, consistent and accurate view of what is going on. Despite the abundance of technology resources, an enterprises ability to adapt to changes in its environment may still be extremely limited. For these reasons, among others, many organizations stand to benefit from some degree of enterprise integration at the business process, application and data levels.
Data warehousing provides integration at the data level. Data warehousing involves combining data from source systems to provide a complete and authoritative set of facts. The business intelligence (BI) and reporting tools that run off of data warehouses provide an integrated and consistent view of the data in support of both operational and managerial decision-making.
The current architectural paradigm for delivering this integration solution includes source systems, a data staging area, presentation servers and a data warehouse. The data warehouse is likely to store data in a dimensional model that facilitates data warehouse browsing and report creation by the users.2 A data warehouse can be distinguished from other types of data stores based on the type of data it maintains and the intended usage of that data. Data warehouses typically store time series data for an enterprises transaction systems along with metrics and facts of interest to the business. This information is available at all levels of granularity and all dimensions of interest to the business. The ultimate goal of a data warehouse is the creation of a single, logical view of an enterprises data.
The data warehouse system is the data warehouse itself, and all the services and data stores necessary to create it and provide users with the required access. The data warehouse system is essentially a type of application that delivers decision support services. One question that can be raised is whether decision support capabilities can be delivered using a different architectural paradigm. In order to answer this question, it is helpful to first examine what services are provided in the current data warehouse architectural paradigm.
Data Warehouse Services
A number of services are provided in the technical architecture of a data warehouse system. These services are what ultimately create and deliver value in the form of decision support capabilities. Data warehouse services can be divided into data staging services and query services. Figure 1 provides a list of the major types of services within each of these categories.
Service-oriented architecture (SOA) has emerged as a new paradigm for solving the enterprise integration problems at the application level. SOA attempts to achieve enterprise integration by delivering application functionality as services to end-user applications and other services. SOA provides a standard way of representing and interacting with software assets, and it allows individual software assets to become building blocks that can be reused in developing other applications.3
SOA is a design style based largely on object-oriented programming with the following characteristics: modularity, encapsulation, loose coupling, separation of concerns, composability and single implementation. Examples of technologies that are at least partially service-oriented include CORBA, J2EE, and DCOM.4
Information as a Service
A service is an encapsulation of logic that provides a discrete function in an SOA. One of the goals of an SOA design is to ensure that each component of data can be used independently from its current implementation. In order for SOA to be effective at this most elementary level, data needs to be a reusable asset for the enterprise rather than as simply an input our output of an application. Delivering information as a service means loosening the tight coupling between data and applications so that data can be controlled and shared across the enterprise.5 Metadata is used to facilitate the delivery of information as a service. The definitions, mappings, business rules, security information and other characteristics of the data are stored in a metadata repository. The concept behind this is to allow information delivered as a service to be regarded as having been certified by the enterprise as trusted data. The concept of service encapsulation means that the underlying data itself is only provided to consumers through the use of the information service. No direct access to the underlying physical data store is permitted.
A Paradigm Conflict Between Data Warehousing and SOA?
Some practitioners have asserted that there is a conflict between the architectural paradigms of data warehousing and SOA.6,7,8 Data warehousing requires an intimate understanding of the data. SOA wants to hide the data behind services. The data exposed through a messaging interface could be quite different from what is stored on a physical medium. Service-oriented applications exhibit loose coupling between their constituent services. This loose coupling is one of the fundamental principles of SOA and is necessary for an agile, flexible architecture. BI solutions are tightly coupled to the data sources that feed the data warehouse. The data warehouse paradigm is based on a batch-centric environment where extract, transform and load (ETL) processing is the method of consolidating large amounts of source system data on a regular schedule for population of the data warehouse.9
Another key different between the data warehouse paradigm and the SOA paradigm is that data warehouses make data available for ad hoc query analysis, whereas SOA is contract driven. This means that a service will execute specific operations and only those operations. All services are predefined. BI is designed to allow ad hoc queries that are unknown at design time. But in order to support this capability, the user must be aware of the underlying data model. This conflicts with the SOA principle that the service must abstract the underlying implementation.10 Furthermore, it is worth noting that ad hoc queries might create performance problems if the environment is not specifically designed to accommodate them.
Reconciling SOA with BI
Three methods have been proposed by practitioners for reconciling SOA with BI. These are service contracts for BI, service-oriented business intelligence (SoBI), and event-driven architecture (EDA).
BI Service Contracts
BI service contracts provide an approach based on services to support the extract data from authorized data sources. This would entail polling the services interfaces on a regular basis, to obtain data. Services become the target from which extract processes would read data. There are two primary modes in which a service can operate as a data source in a BI context: service as the provider of data upon request and service as the publisher of events that are of interest. In both scenarios, the message sizes are small. The solution for large-scale data transfer and transformation would still be through the normal data warehouse import techniques such as ETL. Such physically large messages are not the normal domain of SOA.11
However, this approach would likely encounter problems with network bandwidth. One way to address this would be to increase the interval between polling, but this may result in missing important events that occur during the interval.12
Push technology is likely to become more popular as source system owners more fully understand their role in providing business information to the company. An EDA may provide a framework that facilitates that. An EDA is one where events result in the publication of information and triggering of other events such as data loads. EDA can be used to build services that are more autonomous. For example, services can cache relevant data from other services and get notifications when that data changes. Thus, the consuming service can be decoupled in time from the services with which it interacts and not depend on their availability. By reducing the data load to only those data elements that have changed, it is conceivable that EDA may be able to support real-time insights.
SoBI is an attempt to synergize the architectural paradigms of SOA and BI by integrating the two approaches at the most appropriate architectural level.13 The SoBI architecture makes BI data in the data warehouse available as a service to other applications within the architecture. This availability gives applications a clean way of accessing consolidated data to support the requirements of BI. In this way, the BI architecture becomes an integrated component of SOA. Some ETL processing will be replaced by data subscription based on events. This is also referred to as an EDA.
From the SOA viewpoint, BI can be seen as a collection of services. From an SOA perspective, a data source can be exposed as a service with the introduction of a layer that receives the service request from the service bus, calls the appropriate query and returns the results to the caller.14
Both SOA and the data warehousing architectural paradigms are helpful in addressing the enterprise integration problem. While SOA may be efficient at the transactional level, data must still be integrated and aggregated to support higher-level management decisions.
The current architectural principles upon which the two paradigms are based may not be completely compatible. Several alternatives have been proposed for dealing with this apparent conflict, including SoBI, EDA and BI service contracts. The coexistence of SOA and BI is likely to involve elements of all three approaches.
SOA needs to mature further before a proper synergy can occur with data warehousing and BI. This maturation will take place gradually as the two paradigms seek a balance in the enterprise architecture. Ultimately, their synergy will significantly improve an enterprises ability to provide decision support capabilities and extract greater value from data assets.
If SOA can live up to its promises, and if the SOA approach to delivering information as a service is implemented correctly, this should ensure that data is a reusable enterprise information asset from the start of its existence. Many of the services performed in the creation of a data warehouse such as identification of authoritative data sources, business rules, data mappings, security rules, etc. will have already been performed as part of the SOA design process, and the information will exist in the metadata repository. Some transformation processes such as cleansing, data type conversions, integration, etc. may no longer be necessary since the quality and consistency of data should improve significantly when information is delivered as a service.
There is another set of services that can be associated with the data warehouse project that do not appear in Figure 1. These are those services provided in the design phase such as determination of authoritative sources for data, business rules, semantic definitions, data architecture, design and development, metadata population etc. Many of these services should also have been provided in the design and construction phase of services in an SOA environment. Some of the information acquired in the early phases of a data warehouse project should already exist and be discoverable in the metadata repository.
Thus, at least some of the services currently provided by data warehouse systems will now be provided in whole or in part by information services as part of their construction in a SOA environment. This should significantly reduce both the cost and risk associated with data warehousing and business intelligence projects in the future.
Objects are working their way into the mainstream for systems architectures, and objects can be used to support data warehousing in several ways. BI applications that are SOA compliant should be more usable and flexible. New types of services are likely to emerge as a result of the coexistence of the two architectural paradigms. The services already present in a data warehouse will be exposed and made available for reuse. This is likely to increase the value of the data warehouse. In an SOA environment, EDAs may bring us closer to delivering real-time BI. Both factors should lead to an increased reliance on BI services in decision-making.
1. W.H. Inmon. Building the Data Warehouse. John Wiley & Sons: 2005.
2. Ralph Kimball, Laura Reeves, Margie Ross and Warren Thornthwaite. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses. John Wiley & Sons: 1998.
3. Mark Endrie. Patterns: Service-Oriented Architecture and Web Services. IBM Redbooks: 2004.
4. Bobby Woolf. Exploring IBM SOA Technology and Practice. Maximum Press: 2008.
6. Sean Gordon, Robert Grigg, Michael Horne and Simon Thurman. Service-Oriented Business Intelligence. The Architecture Journal, 2007.
7. Richard Skriletz. Business Intelligence and Data Delivery: Converging with SOA. Business Intelligence Network, 2007.
8. Robin Mulkers. Business Intelligence and SOA. ITtoolbox Blogs, 2006.
9. Gordon, et al.
11. Annika Grannebring and Peter Revay. Service Oriented Architecture Is A Driver For Daily Decision Support. Emerald Publishing Group: 2007.
12. Arnon Rotem-Gal-Oz. Bridging the Impedence Mismatch between Business Intelligence and Service-oriented Architecture. Microsoft Corporation: 2007.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access