Malcolm would like to thank Fabio Corzo for his contribution to this article.
Trying to deal with enterprise information architecture is not easy. Most data practitioners have a hard enough time getting through one project at a time. Quite often they are working on several projects simultaneously as well as helping out with maintenance activities. Enterprise architecture is clearly bigger than any of this, but with so much busywork, it is difficult even to think about it. Furthermore, we are confronted with environments where the architecture is pretty much set, having evolved through the implementation of sets of legacy applications and the acquisition of vendor-supplied package solutions. Unfortunately, while none of this may be easy to deal with, architecture really does matter. It emerges as an issue in many endeavors, but it is especially important where data sharing or exchange happens. The past decade or so has seen an explosion in the implementation of business intelligence (BI), which can be viewed as extracting useful information from the data generated by the operational systems of an enterprise, possibly augmented by data brought in from outside.
A lot of people are unhappy about the results of their BI experiences. One of us (MC) teaches extensively about master data management, and a large segment of the audiences consists of project staff working on data warehouse or data mart projects whose outputs are unusable by the business. These BI applications are no doubt technically sound, but what they deliver has serious deficiencies. Because these deficiencies are inherent in the data being brought into the marts, the project staff tries to fight their way upstream, in terms of where the data comes from, to correct it. This heroic effort resembles some of the early European explorers trying to find the source of the River Nile. Actually, it is a lot more difficult. Even if our intrepid data mart builders could map out the terra incognita of the data landscape, they are probably going to come across great seas of polluted data they will be powerless to clean up, but which are still sources they must use. Perhaps they can partially decontaminate the stuff as it goes into the marts, and to that extent the knowledge gained from their explorations can be useful. However, this is far from solving the whole problem.
Architecture is the Culprit
One of the issues about enterprise information architecture is being able to think about it clearly - to conceptualize it or visualize it. We try to employ useful analogies, like the one just presented about exploring the data landscape. The reality is that enterprise information architecture is complex and consists of many different dimensions, each of which is a valid concept in its own right. This means it maps partially to a wide range of analogies. Unfortunately, no single one of these corresponds to the complete reality of architecture, and we are just going to have to build out our understanding of it one faltering step at a time.
That said, it is possible to ask the question if enterprise information architecture bears significant responsibility for failures in BI. Is it to blame for why rollups will not roll up, for why monthly reports that seem stable suddenly show weird blips of unbelievable data once in a while, for why ad hoc queries can never be relied on, and all the other BI misfortunes that we do not like to even name for fear we will conjure them into existence in our projects?
We would answer that it is. Figure 1 shows what we term the Abstraction-Translation Paradigm of enterprise information architecture. In a nutshell, it visualizes the processes by which applications are created as passing through successive layers of abstraction and translation that ultimately enables business information to be stored as binary representation of facts in implemented databases. This data can then be moved into data marts for use in BI applications. However, it is still a binary representation of facts. To be used, it has to be transformed and translated through the same layers required in the creation of the operational application that produced it. Finally, it emerges back at the business level as information again. Given the complexity of this round trip for information, we should be amazed that BI applications ever produce anything useful, rather than disappointed that they have shortcomings.
The Abstraction-Translation Paradigm
The leftmost stack in Figure 1 represents the layers of analysis and design needed to build an operational application, and the roles played by the individuals who are responsible for what happens in each layer. It is divided up into a column for data and another column for business rules. These end up as physically implemented databases and application logic, respectively.
The process of implementing an operational application begins in the business, where a business analyst gathers business requirements, describes the current business processes and identifies the data used in these processes. The artifacts that are produced ideally include use cases, workflows and a conceptual data model with a glossary of business terms. These represent the business in what is termed the conceptual layer in Figure 1. Unfortunately, conceptual has all kinds of definitions, but here it is taken to mean a direct representation of the business.
Next, on the data side, a data analyst produces a data model. This contains an element of design. Even if it is for a current state, data modelers will always say they are producing a picture of how the business truly sees the data. Whatever truth there is in this assertion, we see business concepts such as vendors, customers and employees abstracted into party entities, inventions of surrogate keys, awkwardly named association entities and so on. We have definitely left the conceptual level and entered the logical level here.
On the business rules side, a systems analyst will decompose what the business analyst has produced into minute detail and come up with a specification. Again, this is not pure analysis and contains an element of design.
The next layer we meet with is the physical level. This is the true realm of designers. On the data side, the objective is to produce a physical data model, and on the business rules side, it is to produce an application design. If a vendor-supplied package is being purchased, it is commonly thought that all of this has been done in advance. However, such packages normally have to be configured to make them work in any given client environment, and this is the equivalent of working with a very high-level programming language and a set of database design patterns. Ideally, the logical data model and system specification are inputs to this process. Points of departure may be a better term.
Eventually, the physical database and application are produced and handed over to the database administrators (DBAs) and production control in the implementation layer. These technicians can influence the architecture by deciding where to locate the database and application, how queries will physically be executed and so on. They can be expected to have close to no understanding of the business environment that uses the solution they look after. However, that solution works to combine the database with the application to move services and data back through all the layers of abstraction used to develop them and deliver usable functionality to the business users, often in the form of automation.
Eventually, the data from the application can be transported to other places in the enterprise for reuse. Data marts are an obvious example. In this data transport layer, there is heavy emphasis on transport concepts, such as XML and middleware. The technicians involved are usually indifferent to the data and feel like they have no responsibility for it. Like truck drivers delivering containers, they concern themselves with their vehicles, the roads they must travel and how to offload at their destination. They have little idea of the significance of what is in the containers.
The BI Side
All of this may be difficult, but today there is an expectation that BI can simply be added to the architecture. The right-hand stack in Figure 1 shows a parallel process for developing a BI application based on a data mart. Here are the businesss information requirements that are driving the development process. The technical actors tend to be specialists in the realm of data marts, data warehouses and BI tools, and so on.
However, right at the bottom layer is the principal driver, which is IT staff who has powerful extract, transform and load (ETL) or service-oriented architecture/middleware tools that enable them to capture physical data and push it into a data mart. Technology makes it just too easy.
Move the data into a data mart, slap an end-user query tool on top of it and you are done. Today, things are improving, and people are implementing reporting metadata layers that explain what the data means, where it came from, etc. This architectural component is still fairly rare, and often only partially functional, but we have nevertheless included it in Figure 1.
There are bigger problems. In BI solutions, the data is typically placed in a star schema format, which correlates to the main query set that defines the business users information needs. That dictates the same levels of development as for the operational system, passing through conceptual, logical, physical and finally implemented layers. Thus, the design finally meets the data transported from the producing operational systems.
The problem is that in the BI application, the captured data has to be transformed again back through all these layers until it makes business sense, just as it had to be in the operational system. There are two difficulties:
- The BI environment tends to be constructed based on an understanding of the information needs of the BI users, with little or no knowledge of the transformations that have taken place in the development of the operational solution.
- The data in the operational systems database derives part of its semantic properties from the transformations that occurred in the levels above it and part from the constraints inherent in the application logic, which has traveled along a parallel path. These properties are thus not inherent in the data itself and cannot be transported with it.
Little wonder then that a column called GROSS_SALES or a table called CUSTOMER may not mean exactly what a data mart designer imagines them to mean.
Each level in Figure 1, represents a different level of abstraction. The term abstraction usually means the separating of ideas from objects in philosophy. In Figure 1, it means extracting the concepts from the prior level that map to the components which have to be manipulated in the current level. For instance, a data analyst will have to take the concepts presented in the conceptual data model - which may be a text document - and put them into a data model in a CASE tool. Each layer has its own set of components and concepts. Thus, the individuals who function at each layer tend to speak a different language than the individuals who function in the other layers. The process of abstraction - mapping concepts in the prior layer to components in the current layer - is matched by a process of translation, and the jargon used to express ideas in the prior layer is translated into the jargon used to express ideas in the current level. Hence the business information becomes the conceptual data element, then the logical attribute, then the physical column and finally a data value in a database and maybe an XML document. This chain illustrates how the same thing gets mapped to different concepts that are described in different technical languages.
A major problem is that the individuals who work in each layer focus heavily on the idiosyncrasies of the components they have to deal with and the tools that help them do their work. Hence a logical data modeler will be concerned about optionality and cardinality, what notation to express them in and deciding which CASE tool to use. Such issues are of no concern to a DBA trying to implement a database. There is some overlap between the layers, but a lot more is distinct and unique about each one of them. Realization of this, however, tends to reinforce the technical specialists allegiance to his or her own technical sphere of competence. Preservation of understanding the business data and rules tends to crowd this out.
The OSI Reference Model
The Abstraction-Translation Paradigm closely resembles the Open Systems Interconnection (OSI) reference model for the exchange of information among open systems. This is illustrated in Figure 2 and detailed in ISO/IEC Standard 7498-1. The OSI model describes how data to be communicated passes through a number of distinct layers, each governed by its own standards. Each layer is intended to handle a different level of abstraction in handling the data, e.g., morphology, syntax and semantics. Ultimately, the data ends up in physical media where transport occurs to the receiving system.
Messages containing the data are thus passed from one instance of an open system to another, where they pass up through the same layers of abstraction until then get to the level at which they started out (the application). Here they can be acted on in terms of their semantic content. This model includes peer-to-peer protocols between the same levels in the different systems. There is no equivalent of peer-to-peer protocols in the Abstraction-Translation Paradigm because operational systems may be designed and implemented years or decades before BI applications that consume their data are built. The individuals who designed the operational system are very unlikely to be around when a consuming BI application is built, and it can be pretty much guaranteed that no trustworthy documentation is available either.
Another concept that the Abstraction-Translation Paradigm resembles is the Zachman Framework. This well-known matrix represents a different number of conceptual levels correlated to data, function, network, people, time and motivation. It is intended to provide a framework for planning enterprise information architecture activities. Graeme Simsion has noted that there has been little evidence of demonstrable success attributed to the Framework in the almost 20 years or so that it has been around.1 Simsion agrees that there is some evidence that it has been adopted, but little proof that it has produced better outcomes.
Yet the Zachman Framework has an appeal because it seems to correspond to reality. If the reality that the Framework is documenting actually represents the same problem domain described by the Abstraction-Translation Paradigm, then it is possible to explain Simsions criticism.
Rather than try to work within the Zachman Framework and perpetuate the problems of abstraction and translation, we should be doing everything we can to disintermediate the different layers in the Abstraction-Translation Paradigm. In effect, this stands Zachman on his head: he has presented us with a view of enterprise information architecture from which we must escape, rather than try to work within. Unfortunately, simply not doing the work at the different levels of abstraction is no escape. The layers in the Abstraction-Translation Paradigm and the Zachman Framework represent what we must do to build information systems. When we simply miss out abstracting and translating in one layer, the work has to be done in another layer. In such cases, the work is usually done partially and imperfectly, and the quality of the overall product suffers.
It will probably be some time before we can figure out a way to truly escape from the Abstraction-Translation Paradigm and go straight from the business to an implementation that can provide reusable data to the rest of the enterprise. Until then, we will just have to do things better within the paradigm, on the basis of understanding that it is defining a problem set rather than providing ways to focus our efforts. One area where the OSI model suggests we can do better is the peer-to-peer protocols within the same layer. Another area for improvement may be truly trapping the abstraction, translation processes, and corresponding artifacts, in building an operational system and having that available - perhaps years later - for a data mart builder. We need to start facing these kinds of challenges, or we will be eternally trapped in a self-limiting world of information management.
- Graeme Simsion. Whats Wrong With The Zachman Framework? TDAN.com, January 2005.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access