Continue in 2 seconds

In a data warehousing project, which way is the best to store data in database?

  • July 10 2006, 1:00am EDT

Q: In a data warehousing project, which way is the best to store data in database? Is storing the extracted tables in one database and the summarized tables in another database better or worse than storing them in one database only?

Tom Haughey's Answer:

It is generally better to store both in one single database. One exception is noted later in this discussion.

Let's start with one simple point. Any time you take what could be one database and split it in two, you do not simplify the problem, you complicate it. Why? Because now you have to interface these two separate databases. There will be some data in common between the two and you will need to keep these in synch and replicate from one database to the other.

Let's follow this with another simple point. A DW will contain multiple levels of data, even within the central data warehouse (CDW), which is the main database within the DW environment. There will be the base level data and different levels of summarization. The base data is the data that is extracted and transformed for loading into the DW. In the DW, these are often called base tables. Base tables are usually quite normalized - not fully, but mostly normalized. Remember this is normalized within the reporting data, not back to the level of detail of operational data. On top of these usually are multiple and different levels of summarization or customization. The second level could be generalized aggregates, which satisfy many queries and feed many data extracts. This level could also contain data tailored for querying. For example, the base level might contain a recursive structure (in a bill of materials-like structure) to support ragged hierarchies. The second level might flatten out the recursion in a denormalized dimension to better suit querying. The third level could be more specific summarization and structures more designed for specific users.

Another reason to keep the data in the same database is that often while interrogating aggregates, it is necessary to drill down to detail, and vice versa. If this data is in a different database, there will be performance and other problems with this. Keeping them together will simplify drill down. In addition, the use of materialized views or materialized query tables will be complicated (and perhaps even impossible) if the base tables and materialized view tables are in different databases. Materialized views are aggregate tables maintained by the DBMS. This is a great advantage that will be compromised by separating the data.

We recommend a three-staged process to determine which aggregates to build and where to store them, as follows:

  1. Create generalized aggregates. These are aggregates that satisfy a broad range of query needs and can be used to feed extracts to data marts. Put these in the main database.
  2. Create specialized aggregates. These are aggregates designed to satisfy a particular group of knowledge workers. Put these in the main database.
  3. If the prior two moves do not give you adequate performance, then off-load the aggregates to a separate server. This is the exception I referred to earlier. In this case, aggregates are moved to a separate server and are essentially a summary data mart. This is required where the aggregates do not otherwise provide adequate performance.

Aggregates are built for two main reasons: to improve query performance and to provide a consistent set of totals for everyone to use. Aggregates should be put in place under the following circumstances:

  • Where there is a known performance problem.
  • Where base data is voluminous.
  • Where the aggregate would achieve a reduction in rows of from 10:1 (at least);
  • For queries that are run often;
  • For queries that are complex; and
  • For intermediate totals that are used often.

One more point. It is generally better to store each aggregate in a separate table, rather than storing different aggregates in the same table.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access