Just when you think there is nothing new under the sun, out pops a jackrabbit and teaches you a lesson in being shortsighted. Just the other day, I learned something new about database design. After spending a professional lifetime in the database design arena and being a participant in the industry's database debates, I thought there was nothing new left to see. But I was wrong.
In order to appreciate what's new in database design, a review of the past is in order. Figure 1 is representative of the different types of database design which have been popular over the years.
Database design began with Ed Yourdon and Tom DeMarco and structured analysis and design. In the very early days of the computer industry, it was thought that database design was a byproduct of functionality. Get the process right and the database design will follow. For the systems that were being built in the early days, this "master file" approach worked just fine. Given the chaos of development that preceded structured analysis and design, this technique for database design was a significant improvement. However, once very large systems appeared, the master file approach soon proved to be inadequate.
The world was not a finite place with a fixed set of requirements. The minute that the structured analysis approach was extended over anything but a finite set of requirements, the system design yielded very complex and questionable results. In short the "database as a by product of function/master file" approach went by the wayside.
Figure 1: Different Approaches to Database Design Over the Years
Looking across the enterprise at an integrated approach to database design were James Martin, Clive Finkelstein and others with the ER (entity relationship) approach. The ER techniques yielded a subject-oriented approach to database design in which data was integrated across the enterprise into subject-oriented databases. The ER approach was an elegant approach that solved many of the shortcomings of the master file approach.
Closely related to the subject-oriented ER approach was the relational approach, fostered by Ted Codd. The relational approach was an academic approach in which data was organized according to the "natural" relationships that occurred among different attributes of data. The relational approach coincided quite nicely with the ER approach of Martin. In some respects, the ER approach was the same as the relational approach except that the ER approach was a top-down approach while the relational approach was a bottom-up approach. There was indeed much affinity between the two approaches, and today for all practical purposes the two approaches are intellectually merged.
Concurrent with the ER and relational approaches to database design was the design work for online transaction processing (OLTP) systems. The epitome of the OLTP database design was the "root only" database. The term "root only" refers to the fact that data is compacted into a single physical structure so that it is very efficient to access. Any given unit of data in a root-only database can be accessed in a single I/O. The result is very high-performance transaction processing. Data is denormalized and compacted into a simple structure so that all the data that is needed is in the structure when accessed. Root-only databases are hyper-efficient to access, but the database design of a root-only database is quite different from the database design of anything else. Lots of different kinds of data are held tightly in anticipation of a need for minimal I/O upon access. The result is very good transaction performance. If there is an original proponent of the root-only database design approach, it would have to be Jim Sheetz.
After the world discovered data warehousing and the need for decision support systems (DSSs), another form of database design was advanced. That form of design is called "star join" design. Star join design is optimal for the flexible access of data. Star joins entail structures called fact tables and dimensions. Unlike a root-only design that is very streamlined and simple, a star join design is very ornate, with many different complex facets. On occasion, several stars can be merged together to create what is known as a snowflake. The way that the designer knows to create a star join is by the gathering and assimilation of requirements. The requirements of how data is to be used and accessed shape the star join and determine the optimal design. Ralph Kimball is undoubtedly the driving force behind the star join approach to database design.
Each of the approaches to database design has an optimal structure for the purpose for which it was intended. Each was designed to fulfill a unique function in an efficient manner. Whether they realize it or not, each design approach represents a trade-off.
By optimizing one aspect of their environment, other aspects are de-optimized. A brief summary of what is optimized and what is de-optimized follows:
- Master File Design
- Optimizes functional design
- De-optimizes integration, trans action performance
- Integrated, Subject-Oriented Design
- Optimizes integration of data, flexibility
- De- optimizes online transaction processing
- Relational Design
- Optimizes rules for structuring of data, flexibility
- De- optimizes online transaction processing
- Root-Only Design
- Optimizes online transaction processing
- De-optimizes flexibility needed for DSS
- Star Join Design
- Optimizes flexibility and accessibility of DSS
- De-optimizes update and OLTP access
Now there is something new under the sun. I first saw this approach at a startup company called KALIDO on a recent trip to London. In order to frame the context for KALIDO, consider that you are a building a data warehouse for a global corporation. There are two very basic problems you face:
- Data warehouses contain data over time. This means that, over time, change is inherent to your data and your data structure. This is true for any data warehouse, global or otherwise.
- Given that a global warehouse represents data from many countries and that those countries are each different, change is constant in a global data warehouse.
These two factors are simply a statement of reality. The data warehouse designer for a global data warehouse must address these issues if there is to be any hope of success.
What is wrong with using a database design that is traditional? Can't you build a global database using a traditional approach to database design? The answer is that you can build a global data warehouse using traditional database design techniques; but if you do, change will eat you alive.
Consider that you have built and populated a global data warehouse using traditional database design techniques, and you learn that the government of Peru has reorganized. There is a new method for taxing oil, and the government of Peru issues an edict that classifications of oil be changed. The data that is already in your global data warehouse has to be taken out of the warehouse, reclassified and reentered in the data warehouse. This is a lot of work and causes your global data warehouse administrators to spend a lot of weekends at the office. At approximately the same time, the state of Yucatan in Mexico decides that oil exploration needs to be encouraged and creates an oil depletion allowance. In order to remain consistent, all the data in the global data warehouse that has been entered must be revisited to determine if the oil depletion allowance applies to it. The global data warehouse administrators spend more sleepless nights pulling data out of the global data warehouse, massaging it and putting it back in. About a week later OPEC decides that there is to be a slowdown in production for Europe. The legal quotas that have been in place for three years all need to be changed. Then a new oil tanker comes into existence that can carry oil at a cheaper rate. Oil transport rates need to be changed. Then there is a revolution in Peru, and the rules change again. Then oil is discovered in the Red Sea. When will it all end? (The answer is it won't.)
There is a never-ending series of changes around the world. With classical database design techniques, this means that the database administrators and database designers for global data warehouse environments have a fully employed future. In fact, the company that is serious about building and maintaining a global data warehouse has its hands more than full.
Enter an innovative approach to database design taken by KALIDO. The database designers at KALIDO have taken the approach that classical database design techniques just won't work for a global data warehouse, and they are right. What the database designers at KALIDO have done is to optimize the design of data on a different set of parameters. Recognizing that change is the beast to slay for a global data warehouse, they have done essentially two things:
- Separated unstable data from stable data. Data, such as transaction data, that is stable as long as it is recorded properly, is separated from data that is unstable, such as reference and organization chart data.
- For the unstable data, KALIDO has added a dimension of time variancy. Time variancy means that unstable data does not have to be converted every time there is a change in the world. Instead, with time variancy, the change is noted by adding a new definition at a newer moment in time. Going back in time is still accurate and a viable thing to do.
The net result is that a global data warehouse can be built which does not require an eternal maintenance effort by an army of database administrators, allows new changes to be accommodated simply and with little fanfare, and allows older data to remain in tact and as valid as it was the day it was first recorded.
In order to achieve this rather remarkable state of affairs, KALIDO has had to reinvent database design. The old structures of database design simply did not suffice. One of the interesting things about the KALIDO approach is that it is not just a theory; it is a practicality, with many installations around the world.
The next time someone says that the last book has been written on database design, introduce that individual to the problems of a global data warehouse.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access