The largest intersection of data integration and BI is the need to understand the data that already exists within the source and target systems. To do this, many have been looking toward ontologies, which provide a sophisticated approach to accounting for, categorizing and drawing relationships between the thousands of ways data exists inside a typical enterprise or problem domain.
Approaching Ontologies
If you don't understand application semantics or, simply put, the meaning of data, then you have no hope of creating the proper data integration and/or BI solution. You must understand the data to define the proper integration flows and transformation scenarios, and provide service-oriented frameworks (such as Web services) to your data integration domain (meaning levels of abstraction).
This is where many data integration projects collapse. You must always deal with semantics and how to describe semantics relative to a multitude of information systems. There is also a need to formalize this process, putting some additional methodology and technology behind the management of meta data, as well as the relationships therein.
To this end, many in the world of data integration have begun to adopt ontologies. Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related. Ontologies are important to data integration and BI solutions because they provide a shared and common understanding of data (and, in some cases, services and processes) that exists within a data integration problem domain and how to facilitate communications between people and information systems. By leveraging ontologies, we can organize and share enterprise information, as well as manage content and knowledge, which allows better interoperability and integration of inter- and intra-company information systems. We can also layer common ontologies within verticals or domains with repeatable patterns.
The view of ontologies was best summarized by Quine, who claimed that the question ontology asks can be stated in three words: What is there? The answer is "everything."1 In the context of data integration and BI, each information system is regarded as a "theory" that recognizes the existence of a set of objects, its own ontology.
At its essence, ontology is a conceptual information model.2 Ontologies describe things that exist in a problem domain. This includes properties, concepts and rules, and how they relate one to another, which supports a standard reference model for information integration (the link to data integration) as well as knowledge sharing. We leverage ontologies in the science of data integration and BI because they support human understanding of information. This use is self-explanatory within the context of data integration. Ontologies also provide the ability to facilitate information-based access and information integration across very different information systems. We achieve this by formalizing the application semantics between intra- and inter-organizational information resources.
A Deeper Dive
As you know, data integration and BI involve much complexity. Ontologies help the data integration architect prepare generalizations that make the problem domain more understandable.3 In contrast to abstraction, generalization ignores many of the details and results in general ideas. When generalizing, we start with a collection of types and analyze commonalities to generalize them.
Clearly, semantic heterogeneity and divergence hinder the notion of generalization; and as commonalities of two entities are represented in semantically different ways, the differences are more difficult to see. Thus, ontological analysis clears the ground for generalization, making the properties of the entities much more clear. Indeed, ontological analysis for data integration encourages generalization. Thus we can say: "Within an ontological framework, integration analysis naturally leads to generalization."4
Considering that statement, it's also clear that application independence of ontological models makes these applications candidates for reference models. Stripping the applications of the semantic divergences that were introduced to satisfy their requirements creates a common data integration foundation for use as the basis for a data integration or BI project.
Returning to the core problem we wish to solve within data integration domains, we are looking to achieve semantic interoperability between very different systems. The solution to this problem is based on our ability to leverage formal ontologies required to account for the different types of ontologies for any business reason. For instance, we can have resource ontologies that we leverage to define semantics used by our SAP systems, but we may also have personal ontologies defining the semantics of a user or a user group. In addition, we have the notion of shared ontologies, which are common semantics shared between any numbers of information systems.5
The best approach to developing an ontology is usually determined by the eventual purpose of the ontology. For example:
- When developing a resource ontology, it is best to adopt a bottom-up approach, defining the terms used by the resource and then making generalizations to the terms.
- When developing a shared ontology, it is best to use a top-down approach, defining the general concepts first and working down to the detail.
- When developing personal ontologies, we look at the essence of the user or user group, top down or bottom up.










Be the first to comment on this post using the section below.