JUN 1, 2003 1:00am ET

Related Links

10 Sustainability Predictions for 2011
February 23, 2011
A Letter to Future Employees: Embrace Analytics
February 3, 2011
A Hunger for Risk
January 6, 2011

Web Seminars

Achieving Real-Time Agility with Operational Warehousing
June 21, 2012
Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand

Data Warehousing Refresh Rates

Print
Reprints
Email

The optimal refresh frequency for a data warehouse depends on the industry, the application, the business process, the time horizon of the business process and the underlying technical infrastructure. In particular, the business process is decisive – if I have a three-week demand-planning supply chain, the refresh rate will be different than if I have a customer on the phone. Also, the "optimal" frequency is not necessarily the "standard" or "most common" frequency. (People who respond to surveys are reluctant to admit what they are doing is less than optimal – even if it is). For example, if yesterday's sales data is captured by the automated system at a point-of-sales terminal, then it is reasonable to request to see yesterday's sales data today. On the other hand, if the sale is not really booked until the invoice is paid, then the same request is less reasonable. In that situation, one would reasonably expect to see invoices that have been paid yesterday reported as completed sales today.

Data Warehouse Refresh RatesCurrentlyIn 18 Months
Monthly41%27%
Weekly26%29%
Daily75%72%
Many times a day2%14%
Near real time0% 10%
Source: The survey was conducted at the TDWI World Conference in New Orleans, Feb. 9-15, 2003. The Quarterly Technology Survey is administered by The Data Warehousing Institute and Giga Information Group.
© 2003 Giga Information Group, Inc., a wholly owned subsidiary of Forrester Research, Inc. and The Data Warehousing Institute. All rights reserved. Reproduction or redistribution in any form without the prior permission is expressly prohibited
Figure 1: Data Warehouse Refresh Rates

The question "What is the optimal (standard) refresh rate for production data warehouses?" is one that requires a quantitative answer. We have teamed up with our colleagues at The Data Warehousing Institute (TDWI) to provide an answer, and that answer is "daily." Daily is reported as the most common refresh rate for data warehouses by participants at the February 2003 TDWI Conference in New Orleans. According to the survey, near real-time data warehousing is barely on the radar at all, with only two percent reporting multiple updates per day. As indicated, the vast majority of respondents update the data warehouse daily (75 percent), with many also performing monthly (41 percent) and weekly (26 percent) updates. (Note that multiple responses were allowed and some enterprises report using all three refresh rates.) However, the number of survey respondents who expect to perform multiple, daily updates to the data warehouse (or near real-time data warehousing) grows from not quite two percent today to more than 24 percent in 18 months. It is true that enterprises do not always perform as anticipated, but it is still likely to be an accurate expression of a business requirement. Under any interpretation, that is significant expected growth, albeit from a modest base. The possibilities of vendor hype are significant, and it is important for enterprises to appreciate the complexities and trade-offs in undertaking near real-time processing. The zero-latency data warehouse sometimes also requires the zero-latency business enterprise. For example, the product properly scheduled by the 128-way massively parallel processor may be on the loading dock on time, but the truck that will transport the macaroni and cheese product to the customer may be stuck in traffic. It is very important to let the need for reduced latency in the business process itself drive the acquisition and development of the technology. For example, if the customer is on the phone, a real-time recommendation makes sense. However, if a product supply chain is two weeks long, knowing what products are selling on a minute-by-minute basis is probably overkill. An overnight batch run will be less expensive and result in replenishment in ample time. Savvy IT organizations will get ready for real-time data warehousing (and related functions such as data quality), but continue to trade off cost and complexity with reduced latency to find the optimal price/performance for their own enterprise's requirements.

Lou Agosta is an independent industry analyst in data warehousing. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, data mining and data quality. He can be reached at LAgosta@acm.org.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.