Delving into the Mysteries of the Logical and Physical Worlds

Once upon a time in our industry, in the day and age of structured analysis and design, there arose logical design and physical design. There were numerous debates as to what was logical and what was physical. Soon, the arguments and discussions grew to be so esoteric that the subject bordered on the metaphysical. Few were wise enough to know where logical left off and physical began. I know that I was never smart enough to understand the many nuances. Therefore, it is with a great deal of trepidation that I enter the logical and physical debate.

The other day, I was describing the corporate information factory (CIF) to a group of clients. Concerning the data warehouse that sits within the CIF, someone in the audience asked me if the diagram that I had on the screen showed a logical or a physical data warehouse. The subject that I had been avoiding for all these years had finally caught up to me.

The answer to the question depends on the business being served by the data warehouse. If the business being served is a business whose focus is singular or near singular, then the data warehouse may indeed be a single physical entity. Consider a business such as Taco Bell. While there are many Taco Bells throughout the world, the truth is that Taco Bell has a very linear business. Taco Bell makes tacos and burritos. They have customers that walk or drive in. Their physical buildings are virtually indistinguishable. The business is very straightforward. It is entirely conceivable that Taco Bell has a single large data warehouse to serve its needs.

Now consider the large diverse business of a multinational company such as IBM. IBM has a software business. IBM has a hardware business (actually, IBM has several hardware businesses). IBM has a services business. IBM has a research business. IBM is in the U.S., in Europe, in Asia and Australia. IBM is the essence of a diverse, multinational company. Is it possible for IBM to do its business on a single data warehouse? No way. For a thousand reasons, if IBM is to do its business properly, it must have multiple physical data warehouses.

There will be one data warehouse for a certain type of hardware. There will be another data warehouse for other types of hardware. There will be a data warehouse for software. There will be a set of data warehouses for Europe, another set for Asia, another set for North America, and so on.

Therefore, if data warehouses must be linked - if multiple physical data warehouses must be created in order to build a single logical data warehouse - what are the rules of the road? If you have multiple data warehouses that need to be linked, how do you go about doing that?

The first rule is to operate at the lowest level of granularity. Worrying about summarized or derived data is simply overwhelming. To make the task workable, focus on the relationship of one physical data warehouse to another at the most granular level. Relationships between physical data warehouses are at the lowest level of granularity.

Suppose the term quarterly revenue is found. Quarterly revenue consists of many granular transactions. Quarterly revenue can be found in a logical data warehouse in many places with no inconsistency or violation of the structure of a logical data warehouse consisting of multiple physical data warehouses.

The second rule is that linkages between data warehouses are made by means of a common key. The key must technologically link from one environment to another. However, the link must be semantically compatible as well as technologically compatible.

Suppose the key structure is customer. If the format of the key structure is char(25), then there may be technological compatibility between the data warehouses. However, if the key structure means only current customers in one physical data warehouse and all customers and prospects in another data warehouse, then there is semantic incompatibility.

The third rule is that non-key data in the physical data warehouses must be non-redundant. In other words, for data not participating in a linkage between two or more physical data warehouses, the unit of data must reside in one data warehouse, not more than one. This enforces the integrity of data across the logical data warehouse.

Interestingly, the non-redundancy of data can be supported at more than one level. Suppose one data warehouse contains data from North Carolina, and another data warehouse contains data from Texas. That's one form of non-overlapping redundancy. Or, suppose one data warehouse contains sales data and another data warehouse contains engineering data. This is another form of non-overlapping data.

The fourth rule is that meta data must be used to communicate between the different physical stores of data.

That's about it. A logical data warehouse can be made up of multiple physical data warehouses if the business circumstances warrant.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access