Data archiving is a quiet market with significant potential. Given the growth in data volumes and intolerance for response time delays, interest in the capabilities of archiving, retention and restore software is reaching take-off speed. This has caused some mixed messages. Indeed, the confusion about the distinction between data archiving and data warehousing in the market, among vendors and end-user firms alike, is so pervasive that it requires clarification and even debunking. A data warehouse stores data for decision support and business intelligence (BI) and may itself require an archive process to remain healthy and make optimal use of the storage technology at its disposal. In the modern sense of the word, archiving is not substitutable for database backup, whether incremental or global, though backup may be involved at some point in the process of archiving. Nor is it substitutable or a replacement for data warehousing. Finally, archiving is neutral with regards to any particular hardware - it does not mean tape archive, though automated tape libraries (silos) or write once, read many (WORM) optical jukeboxes may be used in modern, policy-based data archiving systems. Data archiving, in the modern sense of the word, performs the following functions:
- Offloading seldom or never-used production data on a record-by-record (or individual object) basis from either transactional or BI systems;
- Retaining the business context of the production data and the offloaded records;
- Finding critical offloaded records within a defined service level;
- Restoring offloaded records efficiently to their business context; and
- As policy-based and policy-driven, by representing classes of business transactions (objects) and time periods, within a framework of information lifecycle management (ILM).
This is completely different than a data warehouse function. Data warehousing is designed to answer basic questions such as What are product or service are customers buying or using? An archive is a copy of production in the same schema. This must be emphasized - the archive and the item being archived have the same database schema, so that when a record is restored, it is restored to a consistent data (and business) context. A star schema is not an archive of the transactions it aggregates and transforms - it is a fundamentally different representation of the data and for a different business purpose (decision support versus transaction processing for example). Simply throwing disk space at the problem is not viable in the long term. It is not practical to restore hundreds of gigabytes of data (or more) to get at a single individual customer records or small set of archival records in response to an audit question, a legal action (e-discovery) or a selective recovery operation.
Thus, distinguish an archive from a data warehouse. Archiving deletes individual records (or objects) according to a defined policy and moves it to alternative storage based on business polices that envision restoring the data on a selective basis. Policy-based and policy-driven are key requirements for manageability and usability. As indicated, an archival system really comes into its own when data has to be restored and restored selectively - just a few records, not gigabytes - in response to a legal summons, court order, customer service issue, security investigation or detailed technical issue.
Data archiving raises the bar on the enterprise data management capabilities of both transactional and data warehousing systems. It is important to comprehend the process of ILM. Archiving unused (cold) data off of the data warehouse reduces size, fights data warehousing obesity, improves performance, controls administrative costs, and defers the need for processor and storage technology upgrades. Legacy data warehousing vendors have claimed that the data warehouse can function as the archive. Not so. You do not archive data to a data warehouse; you archive it to a data archive. This includes transactional and BI systems as well as any related system in the ILM process.
Data Warehousing and Data Archiving, Not Substitutable for One Another
The business case for data warehousing is different in significant details from that for data archiving. As indicated, you cannot substitute a data archiving system for a data warehouse nor use a data warehouse as an archive (at least not without changing the meaning of archive). In short, as data warehouses become increasingly central to business operations, they require archiving. The data archive must capture the exact data model of the data structure(s) being archived. That requirement applies to archiving transactional systems or data warehousing systems.









Be the first to comment on this post using the section below.