APR 10, 2012 5:40am ET

Related Links

How to Effectively Outsource BI
May 17, 2013
Tableau Closes High in Stock Market Debut
May 17, 2013
Cisco Bests Profit Estimates on Surging Network Data Demand
May 16, 2013

Web Seminars

IBM & Teradata Compared: A Total Cost of Ownership Study
May 22, 2013
What Is Data Science? You Might Be Surprised!
June 3, 2013
AARP: Embracing Dynamic, Agile Analytics Platforms for Big Data
June 5, 2013
column

Big Data and the Coming Conceptual Model Revolution

Print
Reprints
Email

Conceptual modeling, or semantic modeling if you like, is a rather nebulous area in data management. There seems to be a lot of agreement that it is needed, some disagreement about what it is, and little understanding of how to do it. Yet I believe we are now at a point where we will be forced to deal with it in a far more serious way than we have in the past.

Definitional Problems

I define a conceptual model as "a model of business information purely as information without any concern as to how it might be stored as data."

To me a conceptual model is not a data model in any sense because it is not part of any effort to design a data storage solution. It is a model that captures information used in a particular area of the business.

Other definitions of "conceptual model" exist. Confusingly, the ANSI/SPARC definition of "conceptual schema" is something that describes "... all the data items and relationships between them, together with integrity constraints (later). There is only one conceptual schema per database." This is essentially what is commonly called a "logical data model." 

Then there is the "conceptual data model," defined by Tom Haughey as "a high level or coarse data model which is preliminary in structure, possibly abstract in content and sparse in attributes, that is intended to represent a business area. It is preliminary in structure because it may contain many-to-many relationships."  

I do believe that a conceptual data model has a place in data management, as a preliminary to a logical data model. However, it lacks the detail I would expect of a real conceptual model and suffers from being oriented to a data storage design rather than a full description of a business reality.

Data Models and the Relational Paradigm

There is strong evidence that conceptual models are becoming more important today than they have ever been. Essentially, conceptual models are becoming divorced from traditional data models, and the divorce is likely to be a messy one because of the way that data models and data modelers have grown up since the 1970s.

Data modeling as we know it today is inextricably linked with the relational database paradigm - the way in which the columns of database tables are all "related" together. The relational paradigm is so ubiquitous that data modelers do not realize just how much data modeling presupposes it. And the relational paradigm has been enormously successful. It has been tempting to think, therefore, that a logical data model can truly represent the business - to think that a logical data model is the same as a conceptual model.

Enter Big Data

But now things are changing. The success of columnar databases in ultra-large scale data environments has presented a challenge to the relational paradigm. Of course there is enormous hype about big data, but it is also enough of a reality to demand attention. To use the columnar databases successfully you have to unlearn the relational paradigm. I have seen this on a petabyte-scale project I worked on, and it can be ugly. Once the relational paradigm is jettisoned, data modeling as we have known it goes out the window, too. Yet the need to understand what to make of the data in business terms remains. The challenge of managing big data is to distill it into forms that fit the models that business users have of their information requirements - to distill it into conceptual models. Of course it is also true that data models are needed to design a big data dataspace, but these are also unrelational and must come after detailed conceptual models. The reason is that in big data there is no approximation of the conceptual model and logical data model as there can be in the relational paradigm.

And so it was too in the era before relational. ISAM, VSAM, IMS, ADABASE, IDMS and the prerelational data stores could not be designed using ER-based data modeling techniques built on the foundation of the relational paradigm.

And, if truth be told, this is also true when relational databases use generic patterns to hold data. For instance, I recently spent several weeks producing conceptual models for different types of institutional customer housed in a "party model" generic database. My conceptual models bore no resemblance to the design of the relational data store.

What Data Modeling Cannot Do

If we truly model business information in full detail and compare it to what we find in typical data models, there are significant divergences.  There are things we need to represent in conceptual models that either are not represented or cannot be represented in traditional data models. These include:

  • Relationships between non-key attributes in an entity. For example, Total Sales Amount is related to Total Sales Amount Currency by the relationship "is denominated in," but the relationship cannot be expressed in a data model.
  • The use of code tables. Wherever a code table is used, the physical records it contains represent business concepts that have not been captured and defined in the data model used to design the database in which the code tables are housed. I submit that half or more of the business concepts required for a database can easily exist as code table records and thus be missing from the corresponding data model.
  • Levels of abstraction. Total Sales Amount as a column in a database represents a piece of business information. But Date-Time of Last Update is metadata about a record. If an entity contains both attributes that are data and attributes that are metadata, I cannot represent them as being at different levels of abstraction. At best I can use devices such as naming conventions, but these are not really satisfactory.

Advertisement

Comments (8)
Good thought provoking article - thanks Malcolm.

@Ben

3) Appearing on the same record only reveals an indirect relationship between the two columns through the concept represented by the table itself. It does not define a direct relationship between the two items. Consider two further cases: a) Other columns on the same record, such as Item or Sales Order Notes likely have no relationship to Total Sales Amount Currency, other than being indirectly related via each columns relationship to the table concept (e.g. Order); b) there may be more than one currency column to cover other amounts in different currencies (e.g. Shipping, Cost, Commission, etc.), being on the same record provides no discrete relationship between the amounts and the currencies that apply to them

Having said that, I disagree with the statement that this could not be modeled in a data model. One could model this example such that the amount is a discrete record with its currency, and this record related to the containing transaction. Since we are talking conceptual/logical models, not physical, performance is not a concern.

4) I think the author means that the records within the code table have business meaning themselves, but since they are treated as just "data" they are not individually modeled. For example, if you have a Shipping Mode table, it could contain "A - AIR" and "G - GROUND". I understood the author to be stating a conceptual model would be an improvement since it could model AIR and GROUND as model concepts whereas in a data model they would not exist - they are simply data/records (i.e. instances of Shipping Mode).

Again I see a number of ways to model this in existing data models, such as sub types. I'm not sure where the author draws the line between "concept" and "data". In his envisioned conceptual model would he have each of a organization's customers, vendors and products modeled as separate concepts??? Sounds like a Big Model, not Big Data.

I see methods in existing logical data modeling around each of the issues the author raises. In particular, I would be interested to hear how UML would not fit the bill here.

-- Chris

Posted by Chris C | Thursday, April 12 2012 at 3:13PM ET
Hi Malcolm,

You speak my mind.

I agree that there is a need for modeling that represents the business that is not tied to relational paradigm and without regard to how it will be implemented (ISAM, VSAM, Relational, Big Data, Modeling for BI, ...) later on.

Granted that building these models are time consuming and require interaction with several business users and most businesses won't support it as part of technology projects (not time or $$). I believe the task of building these conceptual models need to be taken up as part of EIM initiatives and should form the foundation for all technology initiatives related to BIG Data, MDM, Analytical and Transactional projects.

I have been advocating these models more in the context of MDM and learned interesting aspects of the business in building these conceptual models that I call "Semantic Business Models". I primarily built these models for Customer Domain as part of MDM initiatives. Some of these observations are:

1. There exists aspects of business that are not handled by any of the existing IT applications (At one client, approximately 40% of the customer information that business is looking for and is using it through other means is not found in any of the IT applications that exists at this firm). 2. Business will use whatever means to get the information they are looking for. This often means important data buried in documents like word, excel, subscriptions to third party content providers, etc. 3. Business Users loved these models which are representation of their business activities and the information that need to support them. They can easily understand it with very little explanation of how to read these models. 4. These models were used creatively for other uses. To quote few examples - at one client, the Semantic Business Model has become part of the training curriculum that is used to onboard new Sales Representatives. At another Client, this document was used to introduce the business to the new CIO on day one on the job. 5. It takes volumes of writing to represent and lot of time to read and understand the same, if the information represented in these models needs to be described in text. 6. POCs built on these models were well received by both Business and IT.

Well, I can go on and on ....

The moral I learned is that there is a need to do a conceptual modeling exercise that is different from the conceptual modeling that is carried out today with strong dependency upon relational paradigm.

Thanks, Mani

Posted by Mani Kumar M | Friday, April 13 2012 at 6:41AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.