Continue in 2 seconds

What factors should an organization consider when determining how often to "refresh" (update) its data warehouse?

By
  • Sid Adelman, Chuck Kelley, Clay Rehm, Les Barbusinski
Published
  • May 07 2004, 1:00am EDT

Q:  

What factors should an organization consider when determining how often to "refresh" (update) its data warehouse?
Is it possible that a data warehouse has different refresh frequency within or among subjects? If yes, what's the criteria or pay-off?

A:  

Since the previous two questions are essentially the same, many experts had the same answer for both.

Sid Adelman's Answer to both of these questions (they are definitely related): With a little thought, there should be different refresh frequencies between subjects. The factors to consider for refresh frequency for each database should be based on:

  1. How current does the data have to be? What's the downside if it's a little stale?
  2. What's the additional cost of a more frequent refresh? This will be based on the size of the database, the refresh strategy, the level of ETL automation, and the volatility of the data.
  3. How frequently does the source data get changed or updated (this one is kind of obvious but is often overlooked).

Les Barbusinski's Answer: Put simply, the refresh frequency of any information in a data warehouse or data mart can be thought of as a formula: Max (X, Y) where "X" represents how frequently "fresh" source data is available, and "Y" represents how frequently the users want the data refreshed. For example, if a certain type of external data (say, SEC industry codes) is only issued monthly, then your refresh frequency for this table would be monthly ... even if the users wanted it weekly. On the other hand, if equity trade data is available in real time, but the users only want a snapshot of yesterday's activity, then the refresh frequency would be daily. Of course, sometimes it's prudent to refresh the data as soon as it becomes available ... in anticipation of a future user requirement (i.e., if you expect the users will want the data refreshed more frequently in the near future).

The only other factor that enters into this equation is cost. Management may, for example, decide that refreshing data nightly is perfectly acceptable given the high cost of the additional ETL, MOM and/or ERP "adapter" software licenses that would be needed to refresh the data in real time. Hope this helps.

Chuck Kelley's Answer: Absolutely, it depends (how's that for a consultant's answer!). If some of the data is available and required for the data warehouse only on a monthly basis, then you would have a monthly refresh. If other data is available and required for the data warehouse on a daily basis, then you would have a different refresh frequency. For example, ATM transactions may be feed on a daily basis, but Account balances may be feed on a monthly basis. If that is what the source systems have, and you need that data for a successful data warehouse/subject area, then by all means, do it.

The business requirements and source system data accuracy will determine how often you need to refresh (not update) the data warehouse. However, if the business requirements are out of whack with your operational systems, then it won't work. For instance, let's say that the business requirement is daily, but because of the way that the operational system works, your end of week processing must be done before the data values are considered accurate. The requirements are accuracy are out of date, so the best you can do would be weekly, until the source system changes.

Clay Rehm's Answer: A data warehouse is made up of many data elements - detailed tables, summary tables, fact tables, dimension tables, indexes, views, etc. Each element may be refreshed or updated at different times.

There are many factors, and one of them is that not all data will be refreshed at the same time. Some data (tables, columns, rows) will be updated real time, some daily, weekly, monthly, etc. It is important to identify specifically which data element needs to be updated and when.

How the data will be used, and how often it will be used can help determine the refresh frequency. This will take interviews and meetings with your clients, so you can understand how they will use the data. Additionally, researching the data sources and how frequently they are updated will provide assistance on how often that same data in the data warehouse should be refreshed.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access