It's no secret that chaos prevails in the average IT environment. Ambiguity, redundancy, and incompatibility add crippling complexity to our IT worlds, throwing practitioners off course from their long-term strategic objectives -- if these existed in the first place. If we broaden our definition of data integration to that of a "data integration framework," we can solve these perennial IT challenges. A data integration framework greatly simplifies architectures and creates a structure within which data-centric applications can be built the "right" way today.

The Current Reality: Chaos in Your Universe

IT organizations exist in state of constant change and ever-increasing complexity: new people, legacy applications, time-to-market pressures, financial constraints -- a steady barrage of urgent demands. All of these create a jumble of great disorder and confusion, resulting in valiant efforts in reaction to the latest fire drill. When technologists find themselves unable to reconcile short-term versus long-term demands, the picture of comprehensibility-eroding complexity is painted. Must we resign ourselves to this continuous chaotic state? No.

The Ideal State: Conceptual Purity

The inverse of chaos is comprehensibility. Fortunately, the two extremes can and do coexist. Comprehensible enterprises look toward the long term and make the appropriate investments in it. They have "one version of the truth" and enable practitioners and business decision-makers to see the whole picture. Before designing new systems, information technologists comprehend how current systems are built and what answers they were designed to deliver. It is understood that this level of analysis is the foundation for developing a clear, long-term IT strategy to serve as a guidepost for future activities, ultimately helping an organization move from a reactive nature to a proactive one.

Complexity, therefore, is the obvious enemy of comprehensibility. One need not look too hard in the typical IT organization for evidence that complexity routinely increases over time. Yet an antithesis of complexity exists: an alternate universe governed by conceptual purity, a utopian state where business problems are fixed the "right" way.

Consider four universal IT dimensions: organization, data, processes and platforms. For all complexity inducing dimensions within these, a conceptually pure version exists. For an IT organization, this translates into organizational efficiency and meta-information.

Reining in Chaos: An Essential Ingredient

Meta-information lies at the core of achieving organizational efficiency. An accurate and complete meta-information repository never grows stale or inconsistent with physical implementation (data, process, platform), nor does it become inconsistent with the functional business view of the applications these system implementations serve.

It contains historical information about previous system versions -- for example, what the system looked like last year, what it cost to operate, how often it failed and how fast it performed. Above all else, the meta-information repository completely describes the current implementation and facilitates maintenance and new development efforts.

Bridging the Gap Between Two Worlds

We know that complexity is not a desired environment, but IT practitioners can rest easier knowing that neither is conceptual purity. Why? Because a purely long-term, strategic approach would entail a significant investment of resources, thereby ignoring the immediate needs of the business. This would place the business at a disadvantage against its competitors.

Achieving a balance between the two extremes is the goal. By moving pieces of the chaotic puzzle into a more desired state of comprehensibility, organizations are more likely to stay on course with their overall navigational path. Therefore, it is possible to meet imminent business requirements and ultimately build IT systems the "right" way.

The Physical Layer: The "What Is"

One can envision layers, where the bottom layer constitutes physical applications. IT directs the vast majority of its attention and resources here. At this layer, objects can very clearly be classified as data, process or platform; they are our working systems. For example, relational databases, dimensional databases and flat-file staging areas occupy the data dimension. Programs, jobs, scripts and services occupy the process layer. Storage, servers, networks, and vendor software occupy the platform layer.

One consequence of so much attention at the physical layer is increasing complexity because nobody can see the complete picture. The complete picture means not just a logical or conceptual view of a single application, but also a conceptual view of all systems that participate in the data, process or platform dimension.

The Conceptual Layer: Where We'd Like To Be

At the opposite end of the stack from the physical layer is the conceptually pure layer.

There is one universal "database in the sky" for all enterprise data. There is one universal business rule repository. Platforms are represented by abstract, interconnected resource pools. The conceptual view is where we all want to be. We may never actually get there, but it is what we would like our IT world to look like.

The Integration Layer: Bridging the Two

The conceptual layer must then be "connected" to the physical layer. This is the role of the integration layer. The integration layer says, "Here's what we have today and how it maps into a clean, simple view of our enterprise." The standards and business rules contained in this simple view are very similar to the conceptual view.

The trick is to be able to migrate systems from the physical layer into the integration layer over time. For example, the integration layer has a unified view of "customer." For a data mart to move up into the integration layer, it must first conform to this unified view. If it cannot, it stays in the physical layer, and elements of it that map to the unified view are connected to the integration layer. Nonconforming elements do not appear in the integration layer. Over time, more dimensions map to the integration layer, bringing us closer to the conceptual view.

Without this integration layer, one lacks the compass by which to navigate changes in the physical layer. For example, a business user requests access to certain types of information. IT determines that this information can only be delivered with a new "temporary" data mart. It's temporary because we intend to extend the data warehouse to include the subject area, and we'll propagate that subject area into a mart that already exists at some future date. The temporary bypass of the warehouse and direct access to source systems to collect the data needed for this temporary mart creates complexity. Unfortunately, not unlike a "temporary" tax, the "temporary" data mart turns out to be not very temporary. Not only does it exist even though the warehouse has been extended, while the warehouse was being extended, the definition of this mart grew as well.

The business users gain benefit from the "temporary mart" and are not willing to fund its replacement. They would rather spend that money on another project they feel is more important. Many enterprise IT systems are collections of such "temporary" systems.

In this example, the warehouse may fit our definition of conceptually pure data. The data warehouse moves into the data integration layer. The new "temporary" mart stays in the physical layer. Business rules that transform and cleanse data bound for the warehouse also move into the integration layer. The platform upon which the warehouse runs may also move into the integration layer.

The Integration Layer as Data Integration Framework

The integration layer is a framework. More specifically, it is a data integration framework containing information about data, process and platform. Meta-information is the glue that binds data, process and platform together.

Generally, data integration is regarded as a connector of data consumers and producers in the physical layer. Data moves among transactional systems and then to operational data stores or data warehouses. From warehouses, data moves to marts - or back to transactional systems. Data integration traditionally implements interapplication cohesion. The interfaces between applications are expressed in terms of data. For example, this data will get daily customer transactions from this other system. It is data that we're exchanging.

The data integration framework is something more comprehensive than this traditional view of data integration. It certainly includes all traditional data integration functions; however, because the data integration framework is the facilitator for our integration layer, which bridges the conceptually pure view with the current reality of the physical view, the framework must include within it all elements of these layers. Therefore, the data integration framework relates meta-information, data, process and platform to each other. Furthermore, so as not to morph into yet another chaotic architecture not unlike the physical layer, the data integration framework implements standards for meta-information, data, process and platform.

Let's take a look at how these four elements fit within the data integration framework.

META-INFORMATION: As stated earlier,meta-information is data about data, data about process and data about platform. Within the data information framework, meta-information not only describes data, process and platform, but also ties the conceptual layer to the applications that have moved from the physical layer into the integration layer.

In other words, meta-information explains not only how applications are built in the integration layer, but also describes how these applications fit within the conceptually pure view. For example, if there is data such as customer demographic information in the integration layer, how does it fit in the "big database in the sky" that the conceptually pure layer describes? If there is processing such as data quality remediation rules in the integration layer, how does it fit into the conceptually pure universal business rule repository? If there are storage and servers and interconnecting networks in the integration layer, how do each of these resources fit into the larger pools described by the conceptual layer?

Furthermore, meta-information ties data, process and platform to each other. Our definition for meta-information includes meta data; thus, meta-information includes data model and data domain information. In addition, meta-information describes a hierarchy of business rules and connects these rules to data. Finally, meta-information maps applications comprising these business rules and using the data just mentioned to platforms.

Therefore, meta-information starts with a description of a customer, includes descriptions of how the customer data is used in business rules, maps these business rules to applications and maps these applications to platforms.

Meta-information includes one other important dimension: time. None of the meta-information just described is static. Meta-information changes as data, process and platform change. Therefore, the data information framework manages meta-information changes.

DATA: To the extent that an IT organization delivers multiple service level categories, data within the data integration framework might physically repeat in each category. Typical service level categories contain at least transactional and analytic. Therefore, customer data might exist redundantly, once for transactional needs and once for analytic needs. More elaborate service level categorizations will allow for further redundancy.

Redundancy is not chaos, however. The data integration framework ties all instances of customer together via meta-information. Thus from a conceptually pure view, if a new application needs to interact with customer data, that new application will have only one place to go, depending on that application's service level category.

The data information framework supports multiple physical representations of data as well. Different service level categories might require data to be structured differently or to be stored in different technologies.

PROCESS: As with data, a business rule - such as the rules that calculate a customer's balance - may again need to be implemented distinctly to serve different service level categories, but all logically redundant rules must be tied together with meta-information. Redundant rules may either be tied to each other directly, or they may be tied to each other indirectly via their relationships with data.

The data information framework holds all distinct implementations of a business rule, and, ideally, the data information framework contains features that allow one implementation of a business rule to serve all service level categories. Whether such unification is possible depends on the data integration framework's maturity and the nature of the business rule.

PLATFORM: In the data integration framework, platforms - comprising storage, processing servers, networks and third-party software - are pooled and decoupled from data and process.

Being pooled means any resource can be expanded and contracted relatively easily without expanding or contracting other resource types. For example, more processing servers can be added without affecting storage mechanisms.

Data and process - applications, in other words - are decoupled from platforms in the data integration framework. The meta-information in the data integration framework maps an application to the platform. If one needs to increase the computer resources allocated to an application, he/she should be able to do so easily.

Thus, the data integration framework facilitates a conceptually pure "big computer in the sky" ideology.

Getting Started

The first step is to develop a conceptual layer. The conceptual layer may contain only pictures and diagrams of your ideal environment, but it needs to be designed for longevity and simplicity. This step requires a strategic and long-term view. Step back and envision the future.

Next, the integration, or the data integration framework, needs to be created. Initially the data integration framework is "empty," meaning no application (data, process, nor platform) yet lives in the data integration framework, but standards are established for how meta-information will be stored and managed.

The first time they are needed by applications built within or migrated into the integration layer, standards are added for what the processes and criteria will be for adding new applications into the data integration framework, what interoperability standards will be between the data integration framework and systems in the physical layer, and what technologies will implement data, process and platform in the integration layer.

This process enables a pragmatic incremental approach to realizing your ideal environment. Only those dimensions in a new application that conform to standards you have created in the data integration framework, or decide to create, reside there; the balance stays in the physical environment. Over time, more dimensions of your data, process and platforms will map to standards in the data integration framework until finally most do. Over time, today's chaos yields to a more conceptually pure environment.

The Reality

Real-world enterprise information technology systems cannot directly follow conceptually pure models for a whole host of legitimate reasons. Traditionally, these legitimate encumbrances to purity have discouraged any real attempts to simplify existing architectures. Without some path to conceptual purity, IT complexity increases without limit. As complexity increases, IT systems decline in comprehensibility and thereby generally increase the cost of doing business.

To break this chaotic and unproductive cycle, a data integration framework offers a pragmatic approach that allows the organization to move incrementally toward an ideal, conceptually pure environment. In fact, some very large enterprises have initiatives underway to achieve conceptual purity via a data integration framework.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access