The optimal refresh frequency for a data warehouse depends on the industry, the application, the business process, the time horizon of the business process and the underlying technical infrastructure. In particular, the business process is decisive if I have a three-week demand-planning supply chain, the refresh rate will be different than if I have a customer on the phone. Also, the "optimal" frequency is not necessarily the "standard" or "most common" frequency. (People who respond to surveys are reluctant to admit what they are doing is less than optimal even if it is). For example, if yesterday's sales data is captured by the automated system at a point-of-sales terminal, then it is reasonable to request to see yesterday's sales data today. On the other hand, if the sale is not really booked until the invoice is paid, then the same request is less reasonable. In that situation, one would reasonably expect to see invoices that have been paid yesterday reported as completed sales today.
Figure 1: Data Warehouse Refresh Rates
The question "What is the optimal (standard) refresh rate for production data warehouses?" is one that requires a quantitative answer. We have teamed up with our colleagues at The Data Warehousing Institute (TDWI) to provide an answer, and that answer is "daily." Daily is reported as the most common refresh rate for data warehouses by participants at the February 2003 TDWI Conference in New Orleans. According to the survey, near real-time data warehousing is barely on the radar at all, with only two percent reporting multiple updates per day. As indicated, the vast majority of respondents update the data warehouse daily (75 percent), with many also performing monthly (41 percent) and weekly (26 percent) updates. (Note that multiple responses were allowed and some enterprises report using all three refresh rates.) However, the number of survey respondents who expect to perform multiple, daily updates to the data warehouse (or near real-time data warehousing) grows from not quite two percent today to more than 24 percent in 18 months. It is true that enterprises do not always perform as anticipated, but it is still likely to be an accurate expression of a business requirement. Under any interpretation, that is significant expected growth, albeit from a modest base. The possibilities of vendor hype are significant, and it is important for enterprises to appreciate the complexities and trade-offs in undertaking near real-time processing. The zero-latency data warehouse sometimes also requires the zero-latency business enterprise. For example, the product properly scheduled by the 128-way massively parallel processor may be on the loading dock on time, but the truck that will transport the macaroni and cheese product to the customer may be stuck in traffic. It is very important to let the need for reduced latency in the business process itself drive the acquisition and development of the technology. For example, if the customer is on the phone, a real-time recommendation makes sense. However, if a product supply chain is two weeks long, knowing what products are selling on a minute-by-minute basis is probably overkill. An overnight batch run will be less expensive and result in replenishment in ample time. Savvy IT organizations will get ready for real-time data warehousing (and related functions such as data quality), but continue to trade off cost and complexity with reduced latency to find the optimal price/performance for their own enterprise's requirements.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access