Data warehousing without archiving is like a garage attached to a residence that a family has lived in for several generations. It becomes the container for a lot of valuable stuff, but soon there is no room for the family car. If that stuff were just pure junk, the family could simply throw it out. Yet everyone knows that as soon as anything has been thrown out, there will be a request for it so we keep it just in case. The same can be said for data warehouses.
Data warehouses often start big and get even bigger. In the October 2007 Data Warehousing Satisfaction Survey 78 percent of data warehouses surveyed were 1TB or larger. Data warehouses over 100TB in size are growing at a rate of 35 percent, albeit from a small base.1 Managing and controlling these exploding volumes of data, while managing reduced latency and doing less with more, is a challenge facing business across all industries.
Managing Your Data Growth
Best practices in data governance indicate that information has a lifecycle. It is born as a customer calls up, orders a product and gets identification. It goes through various transformations as it is related to financial, marketing, demand planning or predictive uses, in order to answer questions crucial to operating and optimizing business decisions. Finally, information has an end game. The customer moves away or the product is discontinued. The data is no longer updated, remains unused by the business and eventually become irrelevant both to the enterprise and the society in which the enterprise does business. As the volumes of data accumulate, the data warehouse becomes obese. Meanwhile, the data warehouse become entwined with mission-critical systems, impacting the performance of both transactional and decision support systems. The lifecycle of the data warehouse and the requirement to perform archiving shifts into the foreground.
Understanding information lifecycle management (ILM) for data warehousing and how it related to data archiving requires following the information supply chain. Every business exchanges products or services for payments. These basic set of transactions form the life blood of the enterprise. Enterprises engage in data archiving as part of an approach to information lifecycle management, of which data warehousing is an essential part. Archiving is the best way both to improve performance of the data warehouse (or transactional system) and to satisfy the requirements for data retention and security. Industry analysts estimate that up to 85 percent of production data is inactive.2 While this number surely differs from one solution to another, a system that has been in production for several years is likely to contain a significant volume of data that is not used at all or used infrequently. It is just common sense - at least to a database administrator. The more data that needs to be scanned, the longer the response time. The more data to be processed, the deeper the index hierarchy and the longer the response. The longer the response time, the more likely that pressure will build from the business to add processors, database licenses and staff. In short, unused data is costly. It does not just sit around quietly - it consumes significant resources.
Building the Business Case
The business case for data archiving includes an analysis of return on investment (ROI). If an enterprise has incurred a substantial financial penalty from losing a legal case due to failure to produce electronic documents (e-discovery), then such hard data can form the business case in itself. Such enterprises have learned the value of archiving - unfortunately, in the college of hard knocks. However, most firms are more fortunate - or lucky. If you are one of those, then look first for savings in storage technology, administrative costs, backup profile and performance and (deferred) system upgrades. Boeing estimates that it costs the company approximately $67,000 for every document it cannot find.3 At $1 a gigabyte, reducing the amount of disk needed by a terabyte is $1,000 saved. This also translates into less disk to administer, less to backup, including a tighter batch window, and being able to do more with less. As the proportion of data in any given system tends to favor cold data over warm or hot data, then the cost benefit of archiving grow proportionately. Cold data is inactive, unused, not touched by inquiries or update, etc. Since data warehousing storage costs rise disproportionately as the size of the warehouse grows, eventually coming to dominate the effort to administer the system, the data warehouse deserves attention as the target for archiving activity. As data warehouses grow into the multiterabyte range, policy-driven data archiving is an especially effective way of reducing data warehousing obesity.










Be the first to comment on this post using the section below.