One nice thing about meta data is that it provides a level of abstraction regarding the description of data elements within a data set, and because of this, we also are able to abstract details of the instantiation of the data set. Conceptually, we can distinguish between the data elements along with their corresponding attributes and the formats in which sets of those data elements are collected, stored, exchanged, presented or grouped together. Consider an example using a simplistic meta data description of a customer:

  • Customer ID: a 10-digit number
  • First Name: variable length character string not to exceed 25 characters
  • Middle Name: variable length character string not to exceed 25 characters
  • Last Name: variable length character string not to exceed 30 characters
  • Date of Birth: fixed-length formatted field using the format MMDDYYYY

Each data element is provided with a simple data type and size description. While this description does accurately provide information about a grouping of data elements, it does not specify whether we are talking about records in an RDBMS (relational database management system), rows in a flat file, a grouping of elements in an XML document, a row in a spreadsheet or any number of other possible materializations. However, any business rules that apply to the elements within a single instance, to a set of data instances, or to a set of these records compared to some other described data set, will still apply, regardless of what the actual physical representation is.
This introduces an interesting question: If our business rules apply to the abstraction as described by the meta data, then can we abstract the application of business rules as well? I have discussed business rules as meta data in previous columns, but this month I am interested in a more basic question regarding data access: How do we manipulate data instances that may be represented in different ways? From a direct access approach, the problem is complex. From a programming point of view, however, using an object-oriented approach to develop an interface provides a way to mirror and consequently to exploit the meta data abstraction.

Figure 1: A Sample Class Hierarchy

Let's focus solely on the question of data access. Presume that we have a set of records and that we want a means for successively accessing those records so that we could (in practice) apply a predefined business rule to each record. We know that each collection of data elements constitutes a single instance. We also know that any means through which the data is accessed is likely to contain a set of data instances. Therefore, regardless of the physical representation, we can define a simple interface that expects to be able to:

  • Instantiate a pointer to the beginning of the data set
  • Determine if there are still data instances in the set
  • Access the next data instance in the set
  • Apply some business rule to the data instance

I am an object-oriented programmer at heart, so my inclination is to describe everything in terms of classes, objects, attributes and methods. In this month's column, I will provide some high-level descriptions of classes that can be used; in next month's column, I will provide more detail as to a sample code implementation, purely as a guideline for understanding approaches to enterprise information integration. There will be a class representing data instances and a hierarchy of classes representing collections of data instances.
A data instance should be able to publish the names and data types of the elements composing the instance, as well as present the value of any of the data elements contained within. Should we desire to allow modification of the values, we might provide a method for updating a data element. The value of defining a standard data instance interface is that we can program our data set classes to always deliver records in the same class, which in turn simplifies the application.

A data set should have high level summary information, such as the number of records in the set, the maximum and minimum data instance sizes, etc., as well as some kind of iterating pointer that can be reset to the first data instance in the set. One should be able to determine whether there are still more data instances in the set, and if so, be able to access the next data instance available. Lastly, each data set class should transparently deliver data instances in the standard object representation described in the previous paragraph.

Conceptually this is great, until you recall the many different ways that we store or exchange data sets. By developing an application interface based on the functionality described, we can structure a class hierarchy that implements that interface and still remains transparent - this is the magic of the object-oriented approach. We might see data sets in RDBMSs, flat files or XML documents. Those flat files might be separated-element files, such as comma-separated files (CSVs), or they may be fixed format files; we may see a variety of database systems as well. Yet we can derive classes within a logical hierarchy that allows us to target each potential data source without requiring a significant code implementation, as is seen in Figure 1.

Next month, we will look at these classes a little more carefully and begin to see how creating standardized class representations simplifies information exchange as well as application of business rules. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access