Continue in 2 seconds

Leveraging Ontologies: The Intersection of Data Integration and Business Intelligence, Part 2

  • July 01 2004, 1:00am EDT

As you may remember from Part 1, ontologies have a wide range of applications. These include horizontal ontologies and vertical ontologies.

Horizontal ontologies are general in nature, such as space-time relationships. These are common ontologies that span multiple domains, are not applicable to any single vertical space and provide a mechanism to organize and standardize information content. We've employed this type of ontology for years in the form of object models, hierarchies, taxonomies and, in many cases, XML vocabularies.

Vertical ontologies, which also incorporate features from horizontal ontologies, are domain-specific, such as natural languages for healthcare or financial services. Vertical ontologies not only define data in terms of semantics native to a particular vertical industry, they also contain rules and formal computer languages that can perform certain types of runtime automated reasoning. This means we understand the meta data and have logic bound to the meta data as well.

The use of vertical ontologies, which extend the capabilities of horizontal applications, is where the most value exists. As we learn to define these ontologies as common frameworks for specific business requirements and define the reuse of such frameworks applicable across multiple like-domains, we also learn to apply languages and reasoning techniques. Ultimately, this provides repeatable information formats, rules and logic that, in turn, provide data integration architects with the ability to leverage existing solutions rather than form them from general-purpose middleware and application development technology.

Web-Based Standards and Ontologies

The use of languages for ontology is beginning to appear, all built on reasoning techniques that provide for the development of special-purpose reasoning services. In fact, the W3C is creating a Web standard for ontology language as part of its effort to define semantic standards for the Web. The Semantic Web is the abstract representation of data on the World Wide Web, based on the resource description framework (RDF) standards (see sidebar) and other standards still to be defined. It is being developed by the W3C, in collaboration with a large number of researchers and industrial partners.

Ontology and Mapping Servers

How do you implement ontologies in your data integration problem domain? In essence, some technology - an integration broker or applications server, for instance - needs to act as an ontology server and/or mapping server.

An ontology server houses the ontologies that are created to service the data integration problem domain.4 There are thee types of ontologies stored: shared, resource and application ontologies. Shared ontologies are made up of definitions of general terms that are common across and between enterprises. Resource ontologies are made up of definitions of terms used by a specific resource. Application ontologies are native to particular applications, such as an inventory application.

Mapping servers store the mappings between ontologies (stored in the ontology server). The mapping server also stores conversion functions, which account for the differences between schemas native to remote source and target systems. Mappings are specified using a declarative syntax that provides reuse.

RDF and Ontologies

RDF (resource description framework), a part of the XML story, provides interoperability between applications that exchange information. RDF is another Web standard that's finding use everywhere, including data integration. RDF was developed by the W3C to provide a foundation of meta data interoperability across different resource description communities, and is the basis for the W3C movement to ontologies.

RDF uses XML to define a foundation for processing meta data and to provide a standard meta data infrastructure for both the Web and the enterprise. The difference between the two is that XML is used to transport data using a common format, while RDF layers on top of XML defining a broad category of data. When the XML data is in the RDF format, applications are then able to understand the data without understanding who sent it.

RDF extends the XML model and syntax to be specified for describing either resources or a collection of information. (XML points to a resource in order to scope and uniquely identify a set of properties known as the schema.)

RDF meta data can be applied to many areas, including data integration. RDF is also able to support new technology (such as intelligent software agents and exchange of content rating).

RDF itself does not offer predefined vocabularies for authoring meta data. However, the W3C does expect standard vocabularies to emerge once the infrastructure for meta data interoperability is in place. Anyone, or any industry, can design and implement a new vocabulary. The only requirement is that all resources be included in the meta data instances using the new vocabulary.

RDF benefits data integration in that it supports the concept of a common meta data layer that is shareable throughout an enterprise or between enterprises. Thus, RDF can be used as a common mechanism for describing data within the data integration problem domain.

In order for the Semantic Web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. This notion is known as knowledge representation. To this end, and in the domain of the World Wide Web, computers will find the meaning of semantic data by following hyperlinks to definitions of key terms and rules for reasoning about data logically. The resulting infrastructure will spur the development of automated Web services such as highly functional agents.1 What's important here is that the work now being driven by the W3C as a way to manage semantics on the Web is applicable, at least at the component level, to the world of data integration, much like XML and Web services.

An example of the W3C contribution to the use of ontologies is OWL Web Ontology Language. OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is derived from the DAML+OIL Web Ontology Language [DAML+OIL] and builds upon the RDF. OWL assigns a specific meaning to certain RDF triples. The future Formal Specification, now in development at the W3C, specifies exactly which triples are assigned a specific meaning and a definition of the meaning. OWL only provides a semantic interpretation for those parts of an RDF graph that instantiate the schema. Any additional RDF statements resulting in additional RDF triples are allowed, but OWL is silent on the semantic consequences of such additional triples. An OWL ontology is made up of several components, some of which are optional, and some of which may be repeated.2

Using these Web-based standards as the jumping off point for ontology and data integration, it is possible to define and automate the use of ontologies in both intra- and inter-company data integration domains. Domains made up of thousands of systems, all with their own semantics meanings, bound together in a common ontology that makes short work of data integration and defines a common semantic meaning of data - this, indeed, is the goal.

Extending from the languages, we have several libraries available for a variety of vertical domains, including financial services and e-business. We also have many knowledge editors that now exist to support the creation of ontologies, as well as the use of natural language processing methodologies. We have seen these in commercially available knowledge mapping and visualization tools using standard notations such as UML.

Types of Vertical Ontologies

Moving forward with the notion that the application of ontologies in the vertical domains is where the most value exists, it's feasible to further define types of ontologies, or architectural approaches. For our purposes we can define them as:

  • Information-based
  • Behavior-based
  • Process-based

Information-based ontologies are the most basic of the architectural approaches. They simply define common information properties, concepts rules, and how they relate one to another, using standard reference models that support information integration as well as knowledge sharing for a vertical domain. Information-based ontologies are required in all domains, regardless of whether you leverage behavior-based or process-based ontologies. What is more, information-based ontologies typically require a repository.

Behavior-based ontologies define terminologies and concepts relevant to a particular application service that is repeatable across multiple vertical domains. A problem example of this is HIPPA processing that is made up of common sets of functions as well as common sets of semantics. The purpose of this type of ontology architecture is to define standard semantic meaning around a service-oriented architecture (SOA), thus providing better reuse from problem domain to problem domain. It is interesting to note that the concept of semantics is missing from the current Web services-based standards, and the use of behavior-based ontologies is something that would fill that gap.

Process-based ontologies define terminologies and concepts around coordinating processes that are relevant to a vertical domain. This differs from behavior-based ontologies in that the process coordinates the use of both behavior (remote functions) and information (information passing between systems). However, like behavior-based ontologies, we are again looking to define standard semantic meaning to common processes that are transferable among vertical domains, such as straight-through processing (STP). Moreover, process-based ontologies define inputs, outputs, constraints, relations, hierarchies, sequences, sub-processes and process control semantics.

Abstraction and Ontologies

When dealing with abstraction, certain characteristics of the objects are coded in the databases in such a manner that the set of characteristics is representative of real-world objects.3 Depending on the importance of the information or need for detail, the set of characteristics is defined as more or less detailed. This is, in essence, the notion of abstraction and ontologies.

To this end, in some instances object-oriented modeling may be employed to define ontologies by defining information at different levels of abstraction. We define this by suggesting a number of specializations. In each specialization, a number of additional characteristics are required, thus increasing the level of detail in the original object. Being an object-oriented model, each specialization inherits the characteristics of the more generic object class. Using this model, you can mix and match ontologies for use inside of your data integration problem domain.

When using this type of ontology model, ontologies are translated into classes, and all classes have special operations for navigation in the ontology tree. This model can support both single and multiple inheritance.

Object abstraction and object-oriented modeling is helpful in creating ontologies for data integration. The support of inheritance is especially useful considering the opportunity for reuse as well as abstraction layers that offer various levels of detail.

Value of Ontologies

While there is no free lunch here, the use of the ontologies concept within modern data integration and business intelligence techniques and technologies seems to be a good match. Indeed, today we are already leveraging certain aspects of ontologies within most data integration projects, regardless of whether or not we understand the concept. The value here is to recognize ontologies as a concept that formalizes the management and integration of information, services and processes - formalizing something we are already doing informally.

The real significance of ontologies - leveraging the reusable aspects - is within vertical domains where the use of common meta data, services and processes has the most worth. Once we get semantics under control within vertical systems (more often, a collection of systems), data integration, or linking a common set of semantics to back-end systems, won't be as daunting as this process is today. What's more, the application of standards such as Semantic Web and OWL will make ontologies that much more attractive.


  1. Berners-Lee, Tim, James Hendler and Ora Lassila,"The Semantic Web." Scientific American, May 2001.
  2. Web Ontology Language (OWL) Guide Version 1.0,
  3. Fonseca, Frederico, Max Egenhofer and Clodoveu Davis, "Ontology-Driven Information Integration." 1999.
  4. From the DOME Project.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access