If you have not noticed, the concepts of enterprise application integration (EAI), and the world of business intelligence and data warehousing are merging. The intersection of the technologies is the interest in information that exists in many different types of systems. Both paradigms require connector technology to account for the differences in how the information is physically presented. Both paradigms must transform the information. Data integration needs to transform the information moving between systems - one-to-one or many-to-many - in real time, accounting for the differences in application semantics during replication. Business intelligence (BI) transforms and aggregates the information in large batches, placing the information in a specialized data store for data mining activities. Moreover, both paradigms must provide mechanisms to make sense of the information; data integration must do so in real time (business activity monitoring or BAM), and BI typically does so with latency.
The largest intersection of data integration and BI is the need to understand the data that already exists within the source and target systems. To do this, many have been looking toward ontologies, which provide a sophisticated approach to accounting for, categorizing and drawing relationships between the thousands of ways data exists inside a typical enterprise or problem domain.
If you don't understand application semantics or, simply put, the meaning of data, then you have no hope of creating the proper data integration and/or BI solution. You must understand the data to define the proper integration flows and transformation scenarios, and provide service-oriented frameworks (such as Web services) to your data integration domain (meaning levels of abstraction).
This is where many data integration projects collapse. You must always deal with semantics and how to describe semantics relative to a multitude of information systems. There is also a need to formalize this process, putting some additional methodology and technology behind the management of meta data, as well as the relationships therein.
To this end, many in the world of data integration have begun to adopt ontologies. Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related. Ontologies are important to data integration and BI solutions because they provide a shared and common understanding of data (and, in some cases, services and processes) that exists within a data integration problem domain and how to facilitate communications between people and information systems. By leveraging ontologies, we can organize and share enterprise information, as well as manage content and knowledge, which allows better interoperability and integration of inter- and intra-company information systems. We can also layer common ontologies within verticals or domains with repeatable patterns.
The view of ontologies was best summarized by Quine, who claimed that the question ontology asks can be stated in three words: What is there? The answer is "everything."1 In the context of data integration and BI, each information system is regarded as a "theory" that recognizes the existence of a set of objects, its own ontology.
At its essence, ontology is a conceptual information model.2 Ontologies describe things that exist in a problem domain. This includes properties, concepts and rules, and how they relate one to another, which supports a standard reference model for information integration (the link to data integration) as well as knowledge sharing. We leverage ontologies in the science of data integration and BI because they support human understanding of information. This use is self-explanatory within the context of data integration. Ontologies also provide the ability to facilitate information-based access and information integration across very different information systems. We achieve this by formalizing the application semantics between intra- and inter-organizational information resources.
A Deeper Dive
As you know, data integration and BI involve much complexity. Ontologies help the data integration architect prepare generalizations that make the problem domain more understandable.3 In contrast to abstraction, generalization ignores many of the details and results in general ideas. When generalizing, we start with a collection of types and analyze commonalities to generalize them.
Clearly, semantic heterogeneity and divergence hinder the notion of generalization; and as commonalities of two entities are represented in semantically different ways, the differences are more difficult to see. Thus, ontological analysis clears the ground for generalization, making the properties of the entities much more clear. Indeed, ontological analysis for data integration encourages generalization. Thus we can say: "Within an ontological framework, integration analysis naturally leads to generalization."4
Considering that statement, it's also clear that application independence of ontological models makes these applications candidates for reference models. Stripping the applications of the semantic divergences that were introduced to satisfy their requirements creates a common data integration foundation for use as the basis for a data integration or BI project.
Returning to the core problem we wish to solve within data integration domains, we are looking to achieve semantic interoperability between very different systems. The solution to this problem is based on our ability to leverage formal ontologies required to account for the different types of ontologies for any business reason. For instance, we can have resource ontologies that we leverage to define semantics used by our SAP systems, but we may also have personal ontologies defining the semantics of a user or a user group. In addition, we have the notion of shared ontologies, which are common semantics shared between any numbers of information systems.5
The best approach to developing an ontology is usually determined by the eventual purpose of the ontology. For example:
- When developing a resource ontology, it is best to adopt a bottom-up approach, defining the terms used by the resource and then making generalizations to the terms.
- When developing a shared ontology, it is best to use a top-down approach, defining the general concepts first and working down to the detail.
- When developing personal ontologies, we look at the essence of the user or user group, top down or bottom up.
Once we define the ontologies, we must account for the semantic mismatches that occur during translations between the various terminologies. Thus, we have the need for mapping.
Creating maps is significant work that leverages a great deal of reuse. The use of mapping requires the "ontology engineer" to modify and reuse mapping. Such mapping necessitates a mediator system that can interpret the mappings in order to translate between the different ontologies that exist in the problem domain. It is also logical to include a library of mapping and conversion functions, as there are many standards transformations employable from mapping to mapping.
Finding the Information
One of the benefits of leveraging ontologies is the fact that regardless of where the information resides, we can understand and map information that is relevant to the data integration scenarios. Ontologies allow you to differentiate between resources, an especially useful feature when those resources have redundant data (e.g., customer information in nearly all enterprises). Thus, in order to make better sense of the data and represent the data in a meaningful way, terms defined in ontologies allow the data integration architects to fully understand the meaning and context of the information. Thus, and again, this is ontology's value within data integration or BI applications.
When considering schemas local to remote source or target systems, the application of ontologies is leveraged in order to define the meaning of the terms used in some domain. Although there are often some communications between a data model and the attributes, both schema and ontologies play key roles in data integration because of the importance of both semantics and data structures.
However, you must also take the time to define a relationship between the ontologies and the physical application or database schema, thus the purpose of mapping as mentioned earlier. Remember, regardless of how the information is structured physically or how the schema is represented, the mapping must occur to properly leverage ontologies.
Another important notion of ontologies is entity correspondence. Ontologies that are leveraged in a business-to-business (B2B) environment must leverage data that is scattered across very different information systems and information that resides in many separate domains. Ontologies in this scenario provide a great deal of value because we can join information together, such as product information mapped to on-time delivery history mapped to customers' complaints and compliments. This establishes entity correspondence.
To gather information specific to an entity, we need to leverage different resources to identify individual entities, which vary widely from each physical information store. For instance, when leveraging a relational database, entities are identified using keys (e.g., customer number). Within the various information systems, many different terms are used for attributes. Thus, the notion of ontologies in this scenario allows us to determine whether entities from different applications and databases are the same or non-crucial to fusing information.
Part 2 of this article, which further explores the use and value of ontologies, will be featured as exclusive online content as part of the July issue of DM Review, available at www.dmreview.com on July 1, 2004.
- In W.V. Quine's "On What There Is" (1948), Review of Metaphysics, Vol. II, No. 5, reprinted in From a Logical Point of View (1961).
- Akkerman, "What are Ontologies? - An Executive Summary," 01/15/2001.
- Partridge, Chris. "The Role of Ontology in Semantic Integration," 2002.
- Cui, Zhan, Jones, Dean, and O'Brien, Paul, "Issues in Ontology-based Information Integration," 2002.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access