For the past few years, most of the articles, books and seminar presentations on the technical aspects of data warehousing have focused on the design and initial implementation of the warehouse. Seems pretty natural, right? If you're going to build a data warehouse, then you need to learn about how to do it correctly. However, many people have recently begun to realize that the true success of a data warehouse is only partially dependent on the initial design and implementation, because data warehouses begin to rapidly grow and change as soon as they're implemented. Therefore, the other factor that significantly affects a data warehouse's success is how these subsequent changes are managed after the initial implementation is complete.

The Static Data Warehouse

Unfortunately, most people are still focused on just the initial implementation. Once the warehouse is built, most organizations feel that the warehouse is "done," and the only thing left to do is to learn how to use it effectively. Unfortunately, if the warehouse remains constant over time, then you are guaranteed to eventually end up with a mismatch between the capabilities of the warehouse and the needs of the organization that created it. In my work, I see this all the time--a warehouse that used to be the twinkle in some IT manager's eye has become woefully out of date. Traditionally, when this happened to older systems, we euphemistically labeled them "legacy systems." This label could be applied to a surprising number of data warehouses, even those that are only a few years old.

Why does this happen? How does a warehouse become a legacy system? Honestly, it all boils down to how we think about the process of building a data warehouse (or any strategic system, for that matter). Traditionally, we're all familiar with the notion of having a well-defined start date and end date for any IT project we undertake. In fact, we often lay out our project plans in some form of a visual project time line so we can quickly see how much more work there is to do before we're done. While this is extremely useful in keeping projects focused (and can help keep them on time as well), it is unfortunate that people focus on the end date. Why? Because people interpret this date as the date when the warehouse will be "done." Any notion that the warehouse is finished implicitly pushes us toward a static image of the data warehouse. And it's only a matter of time before a static warehouse becomes your next legacy system.

The Organic Data Warehouse

Instead, a different mind-set is necessary. The question, "When will my warehouse be done?" should be viewed as equivalent to the question, "When will I be done managing my organization?" In other words, you're never really done. Your organization is a collection of people and, therefore, has an organic nature to it. The group will change over time in a very organic fashion, according to changing internal and external factors.

Fully understanding the implications of the organic nature of an organization means coming to the realization that a warehouse must also be viewed as organic. It simply makes sense. As an organization's needs increase and change, the warehouse that services those needs must also be able to organically grow and adapt to these new needs. If you only wanted your warehouse to be successful on the day you roll it out to end users, then you only need to focus on the initial design. But, for it to be successful over time, you must view your warehouse as a living, breathing and constantly changing creature.

Evolution and the Organic Data Warehouse

There are a few interesting implications that go hand in hand with the notion of an organic data warehouse. First, all organic things in the world go through a process of evolution. Mother Nature constantly makes small modifications to each species, and this evolution allows these species to adapt and thrive in a constantly changing environment. Those species that aren't able to evolve will eventually die off as their surrounding conditions change to the point where they are no longer suited for survival in the new environment. The same is true for data warehouses. Since they are organic, they too must evolve. As the corporate environment around them changes, they too must change. If they don't, they will quickly be headed down the path toward the IT industry's equivalent of extinction: they will become static legacy systems.

The second implication of an organic data warehouse is that you toss the notion of trying to build the perfect warehouse. Biologically speaking, evolution does not imply a progression toward perfection. We'd all like to think that as humans we are more highly evolved than other animals and are, therefore, closer to some universal concept of "perfection." But, in reality, evolution is just about being well suited to survive in your current environment. With data warehousing, we are not looking for perfection in the warehouse. (In fact, I doubt anyone could really describe what defines a "perfect" data warehouse.) We are just looking for something that is well suited to the current business environment.

I need to make one last cautionary comment regarding the evolution of organic data warehouses. We do have to be careful about taking biology's notion of evolution too far with data warehouses. In biology, change happens via random mutation, and those changes that made the organism more viable are propagated via natural selection. In data warehouses, the changes made to it are obviously not random. Change is intentional rather than random and comes from a careful analysis of how the needs of the organization have changed. But, as mentioned earlier, the goals for both biological evolution and data warehouse evolution are the same: to make the next generation/iteration well suited to the changing environment.

Building an Organic Data Warehouse

To build an organic warehouse, you need to focus on three things:

Scalability--First, you must ensure that the hardware and software components of your warehouse are scalable and that the overall integrated architecture is also scalable (to fully leverage the underlying scalable hardware and software). As your organization changes and, therefore, places new requirements on your warehouse, the warehouse must be able to grow to allow the incorporation of the new requirements. Scalable technologies and scalable design techniques ensure that your system will be able to grow in multiple dimensions (more users, more data, more functionality, etc.) to accommodate the ever-changing and ever-increasing demands of your organization.

Incremental Development--Second, your approach to development must be incremental. Rather than starting by building an entire enterprise-wide data warehouse as a first deliverable, start with just one or two subject areas, implement them as a scalable data mart (refer to my column, "Scalable Data Marts," in the February 1997 edition of DM Review) and roll that out to your end users. Then, after taking some time to observe how the users are actually using the warehouse, add the next subject area or the next increment of functionality to the system. Remember, living organisms don't suddenly change. Rather, they incrementally change over a series of generations. The same is true for a data warehouse.

Iterative Development--Finally, you must treat data warehouse development as a non-stop process that continually iterates through the various development cycles. It is this iteration that keeps the data warehouse in line with the needs of the organization, because it prevents the warehouse from ever becoming static.

A Living Breathing Data Warehouse

We must forever rid ourselves of the notion that building a data warehouse is a project. Instead, we must view data warehouse development as an ongoing process. This change in perception has a dramatic impact on how we approach data warehouse development, and this organic view of a warehouse is critical for ensuring that your warehouse will be able to handle the rapid changes of your organization's needs.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access