Continue in 2 seconds

In building fact tables, what is the best practice for refreshing the data?

  • Chuck Kelley, Clay Rehm
  • July 10 2006, 1:00am EDT

Q: In building fact tables, what is the best practice for refreshing the data? A complete drop and recreation/reload of all data, appending the table with records for new data or is it based on the business requirement (i.e., retention of historic data)?

Chuck Kelley's Answer:

I believe it is based on the business requirements as you pointed out. This could be true for dimension tables as well.

Clay Rehm Answer:

This question for me has always been "it depends", since each Fact table is different. Since each fact table has different data sources and different business rules, one fact table strategy can be different than another. My approach has been to review the business rules, the future of the data, the current performance levels and cost, and user satisfaction of the data.

Tom Haughey's Answer:

Let me answer the question point blank first and they go back over some related concepts that will quality the answer.

Your question refers to fact tables, not dimensional tables, so I will stick to that. In general, the most effective method for building and extending fact tables is to append the new facts onto the existing fact data. This is called simple insert. This is particularly true if the fact tables are atomic or at the transactional level. The new transactional facts are just appended to the existing fact rows. Drop and recreate is not the most popular method. I have seen drop and recreate used mostly where the fact tables are aggregated, even if slightly aggregated. An example of an atomic fact record is the delivery line item transaction. An example of an aggregated fact record is Revenue and Volume by Customer by Product by Week.

Even the creation of history records does not require Drop and Recreate unless history is restated or revised each period. Restated history includes history where revision is required to existing history records. This could be needed because some dimensional data has changed, such as an organization change. For example, you are a consumer products company. Customers are assigned to routes and routes to territories. When territories get too big they are split and existing routes are assigned to the new territories. When this happens, the last 2 years of sales for that route are moved to the new territory. In a case like this it could be effective to recreate the history each load period.

There is one other situation that may require total fact table re-build, and that is if you can not determine what the change data is from period to period. However, total re-build always requires that you have the full history available so that you can re-build.

In general, there are four methods of refresh. What you can drop and recreate is not the most popular, nor is it necessarily the most effective. The four methods are:

  1. Simple insert: insert new record only; no change to existing records
  2. Complete refresh: replace entire file regardless of what changed
  3. Selective refresh: replace some rows
  4. Selective update: modify some existing rows

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access