The Enterprise Data Problem
The need to manage enterprise data has been coming into increasingly sharp focus for some time. Years ago, data sat in silos attached to specific applications. Then came the network, with data becoming available across applications, departments, subsidiaries and enterprises. Throughout these developments, one underlying problem has remained unsolved: data resides in thousands of incompatible formats and cannot be systematically managed, integrated, unified or cleansed.
Although multiple data technologies such as various legacy formats, relational and XML are in use, the most challenging incompatibility arises from semantic differences in the structure or schemas of data. Each and every data asset has its own taxonomy of business entities, such as different ways of segmenting products or customers and its own vocabulary for describing these entities. In total, a typical medium or large enterprise will often count thousands of data formats among its possessions. Moreover, the rules relating these formats to one another are recreated manually and hard-coded time and again.
The problem is growing; enterprises continue to acquire subsidiaries, reengineer processes, outsource operations, integrate with supply chain partners and implement regulations. In the meantime, developers are continuing to write new applications and to create new databases, without worrying about overall data management issues or long-term consequences.
In the 1990s, there was excitement about the prospect of moving the enterprise to single enterprise resource planning (ERP) applications. However, this promise is turning into a threat, with Gartner recently predicting that, "Through 2007, monolithic software architecture will remain the largest technical obstacle to broad-scope real-time enterprises."1
Therefore, the imperative of this decade is to manage a flexible and changing environment by introducing a common business understanding or semantics for an environment that will continue to contain multiple data formats.
Impact of the Data Problem
The enterprise data problem has a strong measurable impact on a company's bottom line. Experience shows that the following are some of the ways in which the pain is typically manifested:
- Information Quality The fragmented data environment inevitably leads to information quality problems such as mishandled customer relationships and internal operations. It has been estimated that information quality issues cost U.S. businesses $600 million annually.2
- Business Agility The data problem creates an environment which all but prevents the flexibility that is critical to a modern enterprise responding to a constantly changing environment.
- IT Costs IT remains unnecessarily inefficient so long as it lacks a strategic approach to data management. IT must deal with the frustrating and costly challenges of administering databases (some of which are redundant) one at a time, mapping each database multiple times during its life cycle and writing semantic translation scripts manually in point-to-point fashion.
Semantics Doing it with Meaning
Companies will always struggle with a large number of physically different data formats. While a common data format may never be achieved, the key to efficiently managing data lies in establishing a common understanding. This is the promise of semantics bridging terminological inconsistencies to comprehend underlying business meaning in a unified manner. Data semantics can be achieved formally by relating physical data schemas to concepts in an agreed-upon model of the entire business. This process is also known as rationalization of data.
This central model of the business is becoming known as an information model. The information model does not reflect any specific data model, but rather reflects the agreed-upon business view, business vocabulary and business rules which will provide a common basis for understanding data.
Figure 1: Data Semantics
Semantics is an emerging discipline, which builds upon traditional informal meta data and captures the formal meaning of data. For example, the information model might capture the official business concepts of a "customer" as well as the more specific concepts of a "business customer" and an "individual customer." A semantic mapping will then relate physical data schemas to this information model. For instance, a semantic mapping might capture the fact that the official concept of "individual customer" in the information model is called "client" by a relational database table, "customer" by an XML Schema and "CUST3" by a COBOL copybook. Semantic mapping is therefore responsible for formally capturing the meaning of the data by referring to the agreed-upon business terminology of the information model.
Enterprises should consider capturing data semantics for two main reasons. Tactically, semantics saves time by capturing the meaning of data once. Without semantics, each data asset will be interpreted multiple times by different developers as it is designed, implemented, integrated, extended and decommissioned. This independent interpretation will be time-consuming and error-prone. With semantics, the data asset is mapped and interpreted only once. Moreover, any new assets can be generated from the information model so that they use official business terminology from the outset.
The second and most significant benefit of semantics is a strategic one. Semantics can turn hundreds of data sources into a single coherent body of information. This single body can then provide a common understanding showing where data is located, what it means and how it can be managed systematically. This keeps the data consistent and well defined and removes redundancy. Privacy and security policies may be applied uniformly based on the business content of the data.
Interestingly, semantics is not only relevant to managing enterprise data. The World Wide Web consortium is currently redesigning the Web using semantics, as the Semantic Web Activity demonstrates.3
In summary, while informal data semantics has existed for decades in homegrown data dictionary solutions and ETL tools, a more formal semantics holds the promise of a modern enterprise information architecture enabling a strategic approach to data management.
Semantic Information Architecture (SIA) Core Principles
The core elements of a semantic information architecture may be summarized as follows:
Know Your Data Meta Data. Before data assets can be understood, they must be cataloged. Meta data often used for this cataloging should include the asset's schema, as well as information about an asset's location, usage, origin, relationship to other assets, rules associated with it and assignment of ownership and responsibility. Some of this meta data may be scanned automatically from assets such as relational databases or from existing sources of meta data.
Know Your Business Information Model. The information model is a rich central model of the business. A traditional data model may serve as the basis for an information model, but data models in practice should be extended to a full ontology. An ontology literally is a formal description of what exists. An ontological information model is therefore typically richer and more objective than a data model in its view of the business, including different levels of generalization/specialization (inheritance), a layer of business rules and the traditional entities and relationships currently used. This richness allows the information model to serve as an authoritative reference by which meaning is given to multiple data assets, regardless of format or technology.
The information model may originate from off-the-shelf industry standard models, existing data models in the company, reverse- engineering data schemas and/or from dedicated modeling.
Understand Your Business Data Semantics. Semantics captures the formal meaning of data. It is achieved by mapping (or rationalizing) the data's schema to the information model.
Any database or message format with a schema can be mapped, including relational databases, XML, older hierarchical and network databases and COBOL copybooks. Data that is structured without a schema (e.g., EDI messages and flat files) must be parsed and then mapped.
Computers may aid the mapping process using type information, foreign keys and even name similarities to suggest matches and to provide an efficient graphical environment. However, mapping will never be totally automatic; only a database administrator or other expert will know how to interpret data accurately.
Utilizing the Semantic Information Architecture Data Management, Integration and Quality
Semantics will only create tangible value if it is used to support the activities around data managing data assets, integrating data and systematically improving data quality. These goals may be achieved while addressing the entire body of information using standard business terms, rather than grappling with hundreds of specific data formats.
Figure 2: Data Management
Data Management. A semantic information architecture may contribute to systematic data management in the following ways:
- Data Standards Creating standard database schemas and XML schemas for canonical message formats.
- Data Discovery Helping developers find and reuse data sources.
- Eliminating Redundancy Eliminating overlaps between data assets causing waste and quality problems.
- Influencing Developers Encouraging developers to find the most appropriate data sources and interpret them correctly.
- Impact Analysis In support of change.
- Security Systematically imposing security and privacy policies and regulations.
Figure 3: Data Integration
Data Integration. A semantic information architecture can support the following forms of data integration.
- Application Integration Semantic analysis can be used to automate the coding and maintenance of the data translations necessary in integrating applications, whether these are run directly or within enterprise application integration (EAI) products.
- ETL/BI On the informational side of IT, mature tools exist to load warehouses and to perform business intelligence analysis of the warehouse. However, when designing a warehouse and associated data marts and reports, a semantic information architecture can be invaluable in designing the warehouse schemas, choosing sources and designing data transformations. It can also be important in supporting ongoing change.
- Corporate Portals Nowhere is quality information and correct business vocabulary more important than in the corporate portal. Several good portal products will provide the portal run- time, but a semantic information architecture can be used to identify the data sources, generate the data translations ensuring accurate data and provide meta data regarding the data's meaning and source.
Figure 4: Data Quality
Data Quality. A semantic information architecture provides a systematic approach to data quality.
- Data definition Ensuring all data formats are interpreted accurately by everyone who works with them.
- Cleansing Validating multiple data formats against one central set of business rules which are part of the information model.
- Consistency Using semantics to automatically translate between different data formats, in order to allow consistency to be checked.
A semantic information architecture requires an initial investment in tools, mapping and modeling, as well as an adjustment to more organized ways of working with data. This adjustment requires training and attention to governance. Whether implementing an
architecture in support of a partial project or an entire enterprise, this investment should be weighed against quantifiable value. Experience from early adopters shows that a compelling ROI is often obtained by considering the following sources of value.
Higher Quality Business Information
A systematic approach to data quality creates value by increasing the worth of customer data, inventory tracking and other operational data and of key performance indicators that executives use to maximize revenues and reduce costs.
A majority of IT projects fail or miss their objectives. Data confusion is often the culprit. A common understanding of data is an important basis for deploying new business applications, integrating applications and introducing more flexible process models into the enterprise as well as more drastic changes in the environment such as mergers and acquisitions. Data semantics supports business flexibility by providing a coherent understanding of the data environment, by automatically computing the impact of any change and by reflecting new business rules in updated transformation logic.
Lower and More Predictable IT Costs
Data management will also produce tactical value by promoting the reuse of data assets and by identifying and supporting the decommissioning of redundant systems. Data semantics software can automate the generation and maintenance of accurate data transformation scripts and queries, gradually replacing costly manual work. The use of consistent data ensures the success of modern IT's other architectural layers, including the operational stack of packaged applications, enterprise application integration, business process integration, Web services and the informational stack of ETL, data warehouses and business intelligence.
Cultural issues need attention so that they do not become a barrier to the complete adoption of a more structured approach to managing data. Experience shows that the following principles are helpful.
- Focus on Value The architecture should show tangible value for both the enterprise and for each set of users (e.g., developers, business analysts and knowledge-workers). Track and measure how the information architecture is being used.
- Support Different Roles Different users have different uses for meta data and data semantics. Business users will not be interested in data schemas and data managers may have limited use for some types of code generation. Ensure your data management portal is customizable to different roles.
- Accept Alternative Vocabularies The information model can be used to create unambiguous business terminology, but it need not limit users to a single business vocabulary. Instead, capture multiple synonyms within the information model and document their usage by different groups.
Semantics inspires a vision in which data carries unambiguous business meaning that can be accurately found, aggregated and used without prior knowledge of the data's specific format. A semantic information architecture offers a structured way for forward-thinking enterprises to start to realize benefits of semantics in their IT.
Information is the lifeline of the modern enterprise. Yet decades of business and technological evolution have left enterprises with too many data formats and not enough information. A semantic information architecture aims to address the core of the problem by capturing the precise meaning of data in common agreed-upon business terms.
While an initial investment in training and organizational adjustments in culture may be required, a semantic information architecture allows everyone to speak the same business language, enables data to carry unambiguous business meaning and achieves a data environment that is managed and integrated almost at will. These benefits drive value by providing consistently high quality information to the business and by allowing the enterprise to adapt itself in real-time.
1. 0.8 probability. Gartner, used with permission. From "The Integrated Enterprise From 2003 to 2012," by Jackie Fenn et al.
2. Wayne Eckerson's "Data Quality and the Bottom Line," The Data Warehousing Institute, http://www.dw- institute.com/research/display.asp?id=6028.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access