Continue in 2 seconds

Assuming that the applications and databases are on separate machines, what is the current thinking on where the data warehouse database should reside?

  • Joe Oates, Chuck Kelley, Clay Rehm, Les Barbusinski
  • December 02 2002, 1:00am EST


Assuming that the applications and databases are on separate machines, what is the current thinking on where the data warehouse database should reside? Should it be on a separate machine from the transaction systems databases, on the same large machine but in a different domain, or in the same domain?


Les Barbusinski’s Answer: I’m not aware of any formal research on this topic, but I typically insist on having the data warehouse database on its own server. The only exception to this rule would be on mainframes where co- mingling transactional databases with data warehouse databases can work if each databases exists in its own MVS "region" with its own dedicated set of system resources.

In the UNIX and Windows world, however, co-mingling transactional system databases with data warehouse databases on the same server inevitably leads to resource contention and performance problems. The two types of databases are fundamentally different in the way they use resources (processors, channels and buses, controllers, memory, disk, logs, etc.), and neither UNIX nor Windows segregates its resources in such a way as to prevent contention.

Keep ‘em separate.

Clay Rehm’s Answer: The location of the data warehouse only matters if your users are experiencing performance problems. If you are experiencing performance problems, what kinds of problems are you facing? Is it with query response time or with data load time? Being on the same machine may improve the ETL process, but it most likely will slow down the query and usage. You may have to experiment with using the different approaches and go with what works best for your situation.

Joe Oates’ Answer: Generally, the data warehouse database should be on a separate machine than the one used for production system(s). There are many reasons for this including the following: some data warehouse queries can take a significant toll on the resources of a machine: most production machines are nearly saturated supporting the production systems; management and support of production systems is generally more complicated than a data warehouse, so production systems get first priority.

There have been a few implementations that tried to share a machine (which I recommended against) to run the data warehouse and applications that were not successful and the data warehouse had to be moved to a dedicated machine. That being said, I have had an insurance company client implement its very successful data warehouse on a large IBM 390 system that it shared with some fairly low priority applications. But, in general, it has been my experience that having a separate machine for the data warehouse is the way to go.

Chuck Kelley’s Answer: In 1993, Bill Inmon and I wrote an article in DM Review where we stated that transaction system and data warehouse databases (and applications) should be on different hardware systems. The reason is how do you tune the operating system (and database system) to do a) four reads, six writes (transactions) and b) sequential scan of a million rows and aggregate (data warehouse) at the same time? The processing for each style of computing is different and so you should use different hardware to run them.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access