Continue in 2 seconds

Myths and Truths about Data Warehousing

  • June 01 1999, 1:00am EDT

Over the past several years, I'd like to think that we've all learned quite a bit about the field of data warehousing. We've learned new ways of using our warehouses, we've learned how to integrate them more tightly into our ways of doing business, and we've learned new approaches regarding how to build these warehouses. In fact, we've also learned that some of what we originally believed to be true regarding building and deploying data warehouses really isn't true. But, like old wives' tales, many of these old teachings persist and are still being promoted as fundamental truths about warehousing. In reality, they need to be exposed as myths.

One of the most persistent myths I still see about building a data warehouse is that the initial design is the most important part. People spend enormous amounts of time planning what the warehouse needs to look like to make sure it meets everyone's needs on the day it is rolled out to the end users. Warehouse designers will labor over the original design, and questions are asked such as: Where will we get the data we need? How clean is the data? What does our table structure need to look like to hold all this data and make sure we cover all the potential ways that people will want to use it? But, this has since turned out to be a flawed way of thinking because it unfortunately makes the implicit assumption that your business is static. It implies that we build a warehouse thinking that it must be perfect initially, because once it's built, that's it. We're done. Finito.

But no one's business is static. And, no one's warehouse is static either. First of all, since warehouses are designed to store historical records of all the events that occur within your enterprise, as time goes on, this history of events will naturally grow larger and larger. As the nature of the events changes (due to changing business conditions, including new competitive pressures and new market opportunities), the nature of what is stored in your data warehouse will also need to change. But, by focusing only on the initial design of the warehouse and not on how that design will need to change over time, we end up building static solutions that collapse under their own weight within 12 to 18 months after deployment.

The truth? We must focus on building our warehouses to be "organic," so that they can grow as our needs grow and adapt as our needs change. This means we need to focus on the underlying architecture of how data will move into the warehouse, how it will be stored, and how it will be accessed. The architecture needs to be designed so it is generic enough and flexible enough to handle the fact that the specifics of which data will be moving into the warehouse, which tables will be needed to store the data, and which access methods and tools will be used can all change over time. The first roll out of the data warehouse should be considered as only the initial design, not the ultimate design.

We must plan for tremendous amounts of scalability and flexibility in our warehouse design, because what our warehouse looks like initially will usually be very different from what it looks like a year later. Certainly, there needs to be a lot of attention paid to ensuring that the initial set of users' requirements will be met by the data warehouse, but we must realize that these needs will change; and if our warehouse is not designed to be easily modified, then soon the warehouse's capabilities will no longer meet users' needs.

So, let me define a few ground rules (that is, the various truths) to counteract the myth that the initial design must be the "perfect" design if a warehouse is to be truly successful.

1) Focus on the process first and the data second. Some experts will tell you that the data is the most important part of your data warehouse. They'll tell you that the data must be clean and consolidated. If the data is bad, then no matter how good your warehouse design is, it will ultimately be useless because the results you get from your analyses will be meaningless. This is true. But, it misses the point that the types of data and the quantity of data will change over time; so if you don't focus on defining a process that will allow you to add more data sources and remove old ones, your sparkling clean data will slowly become less useful to your users because it will ultimately no longer be the right type of data they need. Focus first on designing a process that allows different data sources to be added and removed, and only then focus on ensuring that the data is clean.

2) Define the goal to be "adding incremental value to the users." If you try to build the complete solution for your users as the initial design, you will end up taking far too long to finish building it (such "complete solutions" easily take more than a year to build). Even worse, once you finish building it and deliver it, chances are extremely high that what you have delivered is no longer exactly what the users wanted because while you were taking so long to build the solution, their requirements changed. Instead, the goal should be to give to the users something that provides more value than what they had before. Focus on making their jobs incrementally easier and on giving them a system that is incrementally more valuable than what they had previously, rather than on initially giving them 100 percent of what they requested. Deliver incremental iterations every three to six months, and just make sure that each one provides more value than the previous iteration.

3) Rid yourself of the notion of the perfect solution. Finally, as I wrote in a recent column ("The Fallacy of Perfecting the Warehouse" April, 1999), understand that there is no perfect warehouse design. First, I'm not even sure what the best definition of a "perfect" data warehouse would be (though it would probably be based on how well it meets users' needs). Even if such a definition existed, I have no idea how to measure it. But, more importantly, assuming that we had some way of determining that your warehouse were perfect, it would only be perfect for a short time. Pretty soon, users' needs will change, and the warehouse will no longer be perfect. It, too, will have to change. As I stated in point #2, don't strive for perfection ­ instead strive for incremental value.

In the early days of medicine, it was thought that bloodletting was a good way to cure sick people of various ailments. Clearly, that has turned out to be a myth. As the medical profession advanced, improved methods for treating illnesses have been developed. The data warehousing field is also highly dynamic, and new techniques and approaches will continually be replacing older ones. We must continually challenge what we know to ensure that we extract the maximum benefit from our data warehouses.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access