Semantic Meta Data for Enterprise Information Integration
Information Management Magazine, July 2003
The challenges for today's enterprise information integration systems are well understood. In order to manage and use information effectively within the enterprise, three barriers that increase the complexity of managing information have to be overcome: the diverse formats of content, the disparate nature of content and the need to derive "intelligence" from this content. Current software tools that look at structuring content by leveraging syntactic search and even syntactic meta data are not sufficient to handle these problems. What is needed is actionable information from disparate sources that reveals non-obvious insights and allows timely decisions to be made. A new concept known as semantic meta data is paving the way to finally realize the full value of information. Indeed, Tim Berners-Lee's vision for the next generation of the Web is termed the "semantic Web," where semantic meta data plays the pivotal role. By annotating or enhancing documents with semantic meta data, software programs can automatically understand the full context and meaning of each document and can make correct decisions about who can use the documents and how these documents should be used. This article looks at how semantic meta data is created and used within the enterprise.
Definition: Semantic Meta Data
Meta data that describes contextually relevant or domain-specific information about content (in the right context) based on an industry-specific or enterprise- specific custom meta data model or ontology is known as semantic meta data. For example, if the content is from the business domain, the relevant semantic meta data could be company name, ticker symbol, industry, sector, executives, etc., whereas if the content is from the intelligence domain, the relevant semantic meta data could be terrorist name, event, location, organization, etc. Meta data elements that offer greater depth and more insight "about the document" fall under the semantic meta data category.
In contrast, syntactic meta data focuses on elements such as size of the document, location of a document or date of document creation that do not provide a level of understanding about what the document says or implies.
Advertisement
Requirements for Next-Generation Enterprise Information Integration
Let us view the value of semantic meta data from the perspective of deriving business value via enterprise information integration. Semantic meta data can play a critical role in satisfying a number of requirements that customers are seeking from the next generation of information integration and analysis software:
- Extract, organize and standardize (or normalize) information from many disparate and heterogeneous content sources (including structured, semi-structured and unstructured sources) and formats (database tables, XML feeds, PDF files, streaming media, internal documents), and static and dynamic (e.g., database- driven) sources that may be internal or external to the organization (including deep Web and open Web).
- For a domain of choice, identify interesting and relevant knowledge (entities such as people's names, places, organizations, etc., and relationships between them) from heterogeneous sources and formats.
- Analyze and correlate extracted information to discover previously unknown or non-obvious relationships between documents and/or entities based on semantics (not syntax) that can help in making business decisions.
- Enable high levels of automation in the processes of extraction, normalization and maintenance of knowledge and content for improved efficiencies of scale.
- Make efficient use of the extracted knowledge and content by providing tools that enable fast and high-quality (contextual) querying, browsing and analysis of relevant and actionable information.
Semantic meta data is a key enabler of text analytics to derive business value from information.
Creating Semantic Meta Data
In order to extract optimal value from a document and make it usable, it needs to be effectively tagged by analyzing and extracting relevant information of semantic interest. Many techniques can be used to achieve this based on extracting syntactic and semantic meta data from documents. These include:
Dictionary and thesauri: Match words, phrases or parts of speech with a static or periodically maintained dictionary and thesaurus. Dictionaries such as WordNet can be used to identify and match terms in different directions, finding words that mean the same or are more general or more specific.
Document analysis: Look for patterns and co-occurrences, and apply predefined rules to find interesting patterns within and across documents.
Ontologies: Capturing domain- specific (application or industry) knowledge including entities and relationships, both at a definitional level (e.g., a company has a CEO), and capturing real-world facts or knowledge (e.g., Meg Witman is the CEO of eBay) at an instance or assertional level. If the ontology deployed is "one size fits all" and is not domain-specific, the full potential of this approach cannot be exploited.
The last option, also known as ontology-driven meta data extraction, is the most flexible (assuming the ontology is kept up to date to reflect changes in the real world) and comprehensive (since it allows modeling of fact-based domain-specific relationships between entities that are at the heart of semantic representations).
Definition: Ontology
Ontology is a shared conceptualization of the world as seen by the enterprise. Ontologies consist of definitional aspects such as high-level schemas and assertional aspects such as entities, attributes, interrelationships between entities, domain vocabulary and factual knowledge - all connected in a semantic manner. Ontologies and meta data provide the specific tools to organize and provide a useful description of heterogeneous content. The description incorporates as well as extends an automatic classification-supported approach of organizing content in a taxonomy.
In addition to the hierarchical relationship structure of typical taxonomies, ontologies enable cross-node horizontal relationships between entities, thus enabling easy modeling of real-world information requirements.
Semantic Meta Data Extraction and Enhancement
Once the ontology is built and the document is classified into its domain, intelligent agents automatically extract semantic meta data from the document. Based on the classification of the document, contextually relevant semantic meta data (entities such as Microsoft and BEA Systems in Figure 1) are extracted from the ontology to enhance the existing meta data.
Page 1 of 2.







