Data models have been with us since Ted Codd described normalization in 1970 and Peter Chen published his paper on entity relationship diagrams in 1976. Ontology as a discipline in philosophy can trace its roots to ancient Greece. As applied to data management, it is much more recent than data modeling and has only appeared in the past few years. But just what is the difference between ontologies and data models? If they are both about data, do they not boil down to the same thing?
I think they are different and have different practical applications. That is, you need both of them to do data management well, rather than one being able to eliminate the other. Given that many people practice either one or the other, it is appropriate to ask what evidence exists that supports this opinion, and to answer that we need to first consider what data models and ontologies are.
Design of Data Stores
When we talk about data models, we really mean the models that 1) support relational databases, 2) are based on the theory of normalization and 3) use standard notations such as IDEF1X. Of course, flat files, XML documents, Hadoop and non-relational databases are also designed, and hence have "data models" - except these are not normalized and cannot be represented in a diagram using a standard notation like IDEF1X.
Data stores will always be constrained by the technology that is used to build them, and how well this technology supports the requirements that the stores are intended to satisfy. These factors must influence the designs of the data stores. For instance, data marts built using star schemas are subject-oriented and provide fast response to a particular set of queries as well as an easy way to navigate the database for this set of queries. By contrast, fully normalized designs for the same data typically run queries much slower and are more difficult for users to navigate - but they are less prone to update anomalies and can accommodate new technical requirements more easily.
Thus data models are design, not analysis artifacts. Yet, while data modeling for relational databases serves to provide a diagram from which the database can be built, it is also claimed that, when done correctly, these diagrams also represent business reality. However, as we have just seen, their purpose is to provide blueprints for data stores, and the design process involves making choices about how to satisfy requirements.
What Are Ontologies?
The common definition of an ontology is: a specification of a conceptualization.
I find this is a very difficult definition to apply practically. To me an ontology is a view of the concepts, relations and rules for a particular area of business information, irrespective of how that information may be stored as data.
Suppose I am working in a business process outsourcing enterprise that analyzes accounts receivable data for fraudulent activity. A customer sends us their accounts receivable files; we apply algorithms to detect possible fraud and then send a report back to the customer. It is vital for such a company to have a good ontology of data sources. Figure 1 shows a very simple ontology for sources of data, and Figure 2 shows a very simple ontology for principal types of data for this company.
We can begin to see the usefulness of these ontologies when we think about what the term "Customer Data" means inside this enterprise. It could mean many things, including:
- The BPO data that customers provide for processing (BPO Data in Figure 2)
- Data that the customers provide about themselves (Administrative Data in Figure 2)
- Data in the Customer Master database (Administrative Data in Figure 2)
- Any data that is sourced from customers, irrespective of type (Customers in Figure 1)
When someone discusses "Customer Data," the ontologies help reveal what they are really talking about. Without the ontologies, discussions can be riddled with misinterpretations or turn into a frustrating waste of time as participants try to analyze what is meant by "Customer Data."
Thus ontologies are analysis, not design, artifacts.
Bringing Data Models and Ontologies Together
Using ontologies to drive clarity in business discussions is beneficial. We can clearly see that Figures 1 and 2 are not data models, and so data models cannot be used in this example. However, ontologies also play a vital role in conjunction with data models.
As noted above, data models are sometimes held to represent business reality, as well as provide a design for a data store. Yet, a data model of any kind has to be a generalization across many different business views. For example, I worked with an investment bank that had Customer Master for the companies that were its clients. This Customer Master used the Party Model, which generalized across all the different types of businesses and their relations. For instance, there were brokers, hedge funds, trusts and mutual funds; there were different hierarchies for risk, sales, legal ownership and control. Actually, it was much more complex than this. If, for instance, information on a trust was needed, the trustee and the custodian for the trust (different legal entities to the trust) also had to be returned. If information on a hedge fund was requested, the hedge fund manager (again a different legal entity) also had to be returned.
Figure 3 summarizes this situation. A data model was used to design the Customer Master, which used a generalized Party Model approach. Users saw information in business views on screens and reports, and these were designed based on ontologies. Data was translated from the data store into the business views using semantic adapters. Each semantic adapter was designed using the data model and the relevant ontology.
I submit that the situation in Figure 3 is what always happens, except it is not formalized. For instance, a temporary table with unique associated SQL statements serves as the semantic adapter, a report (with additional processing) serves as the business view, and the ontology is "tribal knowledge" in the heads of business analysts and SMEs. Formalizing it can create much greater efficiencies through documentation and reuse.
An additional - and very important - function of ontologies is to facilitate knowledge discovery in databases and beyond. The concepts and relations in an ontology can be enriched with tags - and today that is usually hashtags. Data assets of all kinds can then be tagged with the hashtags. This includes the content of regular relational databases but extends to documents, files and so on. Thus, ontologies provide support for a much richer form of search where a user can identify one concept but get back information on related concepts via the ontology. Space does not permit a more detailed discussion here, but it can immediately be seen that ontologies support knowledge discovery, whereas data models cannot do this to the same extent.
From this discussion it can be seen that data models and ontologies both have useful roles. Perhaps in the past we expected too much from data models, but ontologies now provide a much needed addition to the toolset of data management.