What do service-oriented architecture (SOA), databases, program logic, business process and program management have in common? All of these elements within a single enterprise should share the same set of terminology and conventions. There ought to be a unifying foundation that logically binds all of these pieces together into one common picture. However, this is seldom the case, and that fact is one of the primary reasons why systems integration has traditionally been so costly, time-consuming and ultimately unsuccessful. There is, though, a practice area emerging specifically to deal with what may likely be the most pivotal of all enterprise technologies – this new practice is semantic integration. The goal of this article is to provide a high level overview of semantic integration as a community of practice and to explore the practical implications and impacts associated with it in the context of a variety of typical IT projects or issues.

 

Technology alone does not solve problems. People solve problems by leveraging the appropriate technologies within the context of communities of practice. Those communities of practice involve the utilization of one or more methodologies that have been tailored to suit the needs of the community, the problem space and the related technologies involved. It is important to keep this in mind because most of the expectation disconnects that occur around IT projects tend to be based upon the promise or potential of a particular technology before the appropriate community of practice has been established.

 

This situation is more commonly referred to as the “hype cycle.” We are familiar with the hype cycle surrounding SOA and somewhat familiar with the related term, “semantic Web.” In both cases, the promise of those technologies surpassed the maturity of the communities of practice that arose around them, and the first generation of implementations suffered greatly. Another thing to keep in mind is that to solve an enterprise technical issue, one often needs to exploit multiple communities of practice working together. It is in this context that we shall examine semantic integration as an emerging enterprise IT community of practice that will revolutionize data management as we know it.

 

Semantics is an often misunderstood term, even moreso in regards to its technical applications. In philosophy, semantics refers to the study of meaning. The representation and dissemination of meaning, though, is what IT is all about. Every data element, every character in a string, every variable in an equation - they all express meaning in one form or another. Furthermore, that meaning is enhanced through frameworks of syntax and grammar as well as through countless explicit and implicit relationships. All system design is predicated upon a contract of shared understanding between stakeholders, developers and service providers; when something goes wrong, this is often the first place to look.

 

There are a number of specific standards and tools that have emerged over the past few years to support semantic integration; however, first we need to examine the problem space from a philosophical and business level. To understand how Semantics can be used to facilitate enterprise integration, we must first understand how Semantics relates to the practice of IT. Semantics is heavily focused upon hierarchies of meaning and relationships and, as one might expect, semantics has its own hierarchy.

 

Figure 1 no doubt contains some terms that are already familiar: vocabularies, taxonomies, ontologies and so forth. There is a new term depicted though, and this term is critical to the successful application of the other concepts to enterprise IT. The new term is called a “semantic set.” The semantic set is borne out of the recognition that no organization, regardless of how specialized it might be, is wholly dependent on one single vocabulary, taxonomy or ontology. Let’s also explore what these terms signify for us in their enterprise integration context.

  • Vocabulary – This is the atomic-level view and is analogous to a data entity.
  • Taxonomy – This includes the vocabulary, is a straightforward hierarchy and is analogous to earlier database management system (DBMS) design paradigms.
  • Ontology – This includes both vocabulary and taxonomy and represents a structure that expresses both a hierarchy and a set of relationships between vocabulary elements within that hierarchy. This is roughly analogous to the design paradigms involved in relational database technology, although a schema is not necessarily an ontology and tends to be restricted to the system level.
  • Semantic set – This is the recognition that data design (in fact all design) for the enterprise extends beyond the bounds or scope of any one system. The enterprise must deal with multiple ontologies, taxonomies, and vocabularies and reconcile them on an ongoing or evolutionary basis.

So, how do these concepts or semantic constructs relate to enterprise integration in general and data integration in particular? The most important problem they begin to solve is abstraction. Even at this cursory level, we can see that we are already beginning to separate elements of design from individual systems and attributing them to a larger enterprise semantic model or framework. This starts to solve the most fundamental problem in all system integration – redundancy. Each time a system is designed outside of the context of the holistic knowledge ecosystem which spawned it, there is a great likelihood that capability will be reproduced that already exists, and then that capability will require reconciliation or integration back into the enterprise. In other words, if we don’t have a shared foundation by which to measure our designs, we are condemning ourselves to repeating them over and over again - a costly and wasteful exercise that every IT department is all too familiar with.

 

 
 

Where does semantic integration start? Unfortunately, if we view it through the lens of any one development lifecycle, we will likely miss the point of the exercise. We do, however, have a mechanism in many enterprises which can be modified slightly to provide the launching point and residence for semantic integration activities. That mechanism is the enterprise architecture (EA). Every EA is in fact its own semantic construct – in other words, the metamodels which underpin an EA have been developed specifically in attempts to characterize and help describe the nature of complex (families of) systems and data. EA frameworks include metamodels such as Unified Modeling Language (UML), Department of Defense Architecture Framework (DoDAF), Federal Enterprise Architecture Framework (FEAF), The Open Group Architecture Framework (TOGAF) and the Zachman Framework. One might argue that Information Technology Infrastructure Library (ITIL) can be grouped with them as another EA framework. All EA frameworks have something else in common - they are limited in their ability to fully characterize the more unique idiosyncrasies of the environments they are chartered to describe. Attempts to build ever more robust EA metamodels to capture all possible variations of enterprise ecosystems has led to many if not most of the EA frameworks becoming unwieldy and overly complex. Once the tool becomes more complicated than the problem it was designed to solve, it is likely to be abandoned for more practical alternatives.

 

One of the more common problems associated with enterprise architecture or system design is the fact that most enterprises have to deal with multiple paradigms simultaneously. For example, the database developers may be using entity resource diagrams, the agile development team is using UML, the business process gurus have employed Business Process Modeling Notation (BPMN), the EA folks have adopted Zachman and the data center team is using ITIL. How much of an enterprise’s integration woes can be directly attributed to this dilemma, and how can all of these approaches be mediated? The architecture and design efforts represent the logical home for all semantic development and management – from that vantage point, it can help correct the design integration issues as well as all follow-on implementations.

 

An enterprise semantic model is the one tool that can bind all of the diverse design paradigms together. This is much more than mere metadata. The enterprise semantic model allows for a common frame of reference across all aspects of the enterprise, including:

  • The architecture tiers.
  • The stakeholder communities (of interest).
  • The lifecycle phases, as well as lifecycles within lifecycles.
  • The evolution and historical continuity of the enterprise across time.

The enterprise semantic model allows for orchestrated, continuous evolutions rather than fits of massive transformation. It also provides the primary interface to the user community. The correlation of information within and across semantic models is referred to as semantic orchestration.

 

 
 

The connection with the user community is an important element to focus on. Over the past several years, the Department of Defense (DoD) has been employing a paradigm referred to as communities of interest (COIs) as part of the Netcentric Transformation Initiative. COIs are user groups divided by functional expertise or domains, and each group is charged with helping to define the data or vocabulary for all aspects of their domain. Increasingly, these types of groups have been capturing their information as semantic vocabulary as opposed to data entities. The vocabulary terms are human readable and thus translatable and transferable (using a variety of ontology standards protocols such as Resource Description Framework (RDF) and Web Ontology Language (OWL) across systems, across enclaves and across domains. The one problem that such groups have begun running into is analogous to issues encountered by most data standardization endeavors - domain boundaries tend to blur or overlap.

 

Most enterprises deal with multiple domains. This is often expressed by dozens or more systems (or services) with hundreds or more data entities crossing perhaps a dozen functional domains. Until now, there has been no real mechanism or methodology for deconflicting these boundary interfaces. Enterprise interoperability scenarios arise from the need to ask questions across these boundaries – the traditional way we’ve handled that task is through hard coding the question logic into our data architecture through extract, transform and load (ETL) and similar tools acting directly against databases within the enterprise. In other words, in order to answer even the most basic questions, we are forced into highly complex integration – worse yet, most of this integration is not abstracted or loosely coupled within our architecture. Thus, every time we need to modify our questions, we need to do some transformation close to the data source. Certain master data management (MDM) and business intelligence (BI) technologies have been working to reduce the need for this costly transformation, but these approaches have not and cannot eliminate it.

 

 

How can semantic technology be used to help solve these integration issues? One of the immediate applications is something referred to as context mapping. Context mapping is the analytical process of reconciliation across semantic boundaries. This reconciliation performs a sort of meta-integration function for the enterprise by allowing the COIs to deconflict terminology or data entities before they are built into systems. The contexts recognize the fluid nature of data interpretation and the evolutionary nature of data exploitation within the enterprise by allowing for what-if dynamic groups and relationships. The tools used by semantic integrators to accomplish this include visual ontology and vocabulary management suites and a new generation of ontology-driven wikis. Other specialized tools allow developers to extract taxonomies from existing relational database management systems (RDMSs) and map them against existing entity relationship diagrams.

 

 

Perhaps the best illustration of how and where semantic integration can play a critical role is in support of complex SOA-focused projects. One of the key problems with SOA projects has been the lack of a consistent approach toward data architecture. In fact, many SOA purists tend to dismiss the importance of data design within SOA implementations completely, assuming that data support for services requires only consideration for any one service in question. What these folks don’t realize, of course, is that SOA points to the need for a unified data architecture in order to fulfill its primary objectives of loosely coupled or abstracted architecture layers. If one takes into consideration several of the factors missing from the current SOA paradigm that are necessary for such projects to work, it becomes clear that the true context of most of these projects is enterprise integration. Semantic integration is one of the missing elements and represents the most effective way to interject data design considerations into the front end of any SOA implementation. If you add this missing piece and perhaps other elements of enterprise integration not traditionally involved with SOA, one begins to see a new practice area emerging that could be referred to as services-oriented integration (SOI).

 

 

 

Another area which will become highly dependent upon semantic integration is the fusion of structured and unstructured data sources. When one thinks of the term semantic Web, what often comes to mind is the ability to more effectively discover and manage complex, unstructured data resources (i.e., documents, Web pages and other rich media). This hasn’t really materialized yet, because traditional search engines cannot easily index non-HTML resources and cannot effectively index HTML-based content. Parsing every word of content within a Web page is not an effective way to determine what should or shouldn’t be relevant in a search. Every unit of content can contain and ought to exhibit its own taxonomy. These document or content unit taxonomies can begin to allow more effective exploitation of unstructured data. Reconciling those taxonomies with one or more semantic models can then allow fusion with structured data sources, which can also be mediated against semantic models. This is the core foundation for structured and unstructured data fusion, and it is equally applicable on the Web as well as within any given enterprise environment.

 

 

The practice of semantic integration is new and will continue to evolve, yet it holds tremendous promise. Like other technologies and practice areas, though, it must be viewed in context and exploited in concert with a variety of other complementary practice areas and technologies. Pragmatism, not hype, is the path to success.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access