Data archiving is a quiet market with significant potential. Given the growth in data volumes and intolerance for response time delays, interest in the capabilities of archiving, retention and restore software is reaching take-off speed. This has caused some mixed messages. Indeed, the confusion about the distinction between data archiving and data warehousing in the market, among vendors and end-user firms alike, is so pervasive that it requires clarification and even debunking. A data warehouse stores data for decision support and business intelligence (BI) and may itself require an archive process to remain healthy and make optimal use of the storage technology at its disposal. In the modern sense of the word, archiving is not substitutable for database backup, whether incremental or global, though backup may be involved at some point in the process of archiving. Nor is it substitutable or a replacement for data warehousing. Finally, archiving is neutral with regards to any particular hardware - it does not mean tape archive, though automated tape libraries (silos) or write once, read many (WORM) optical jukeboxes may be used in modern, policy-based data archiving systems. Data archiving, in the modern sense of the word, performs the following functions:
- Offloading seldom or never-used production data on a record-by-record (or individual object) basis from either transactional or BI systems;
- Retaining the business context of the production data and the offloaded records;
- Finding critical offloaded records within a defined service level;
- Restoring offloaded records efficiently to their business context; and
- As policy-based and policy-driven, by representing classes of business transactions (objects) and time periods, within a framework of information lifecycle management (ILM).
This is completely different than a data warehouse function. Data warehousing is designed to answer basic questions such as What are product or service are customers buying or using? An archive is a copy of production in the same schema. This must be emphasized - the archive and the item being archived have the same database schema, so that when a record is restored, it is restored to a consistent data (and business) context. A star schema is not an archive of the transactions it aggregates and transforms - it is a fundamentally different representation of the data and for a different business purpose (decision support versus transaction processing for example). Simply throwing disk space at the problem is not viable in the long term. It is not practical to restore hundreds of gigabytes of data (or more) to get at a single individual customer records or small set of archival records in response to an audit question, a legal action (e-discovery) or a selective recovery operation.
Thus, distinguish an archive from a data warehouse. Archiving deletes individual records (or objects) according to a defined policy and moves it to alternative storage based on business polices that envision restoring the data on a selective basis. Policy-based and policy-driven are key requirements for manageability and usability. As indicated, an archival system really comes into its own when data has to be restored and restored selectively - just a few records, not gigabytes - in response to a legal summons, court order, customer service issue, security investigation or detailed technical issue.
Data archiving raises the bar on the enterprise data management capabilities of both transactional and data warehousing systems. It is important to comprehend the process of ILM. Archiving unused (cold) data off of the data warehouse reduces size, fights data warehousing obesity, improves performance, controls administrative costs, and defers the need for processor and storage technology upgrades. Legacy data warehousing vendors have claimed that the data warehouse can function as the archive. Not so. You do not archive data to a data warehouse; you archive it to a data archive. This includes transactional and BI systems as well as any related system in the ILM process.
Data Warehousing and Data Archiving, Not Substitutable for One Another
The business case for data warehousing is different in significant details from that for data archiving. As indicated, you cannot substitute a data archiving system for a data warehouse nor use a data warehouse as an archive (at least not without changing the meaning of archive). In short, as data warehouses become increasingly central to business operations, they require archiving. The data archive must capture the exact data model of the data structure(s) being archived. That requirement applies to archiving transactional systems or data warehousing systems.
For example, the data warehouse is designed to support BI decisions and operations about which markets to enter or leave, which products or services to develop or sunset and provide a basis for predictive analytics and forecasting business results. Likewise, if an enterprise wants to perform cross-selling or up-selling or demand-planning, then a data warehouse will be useful. If you have a customer on the phone who wants to know, Where is my stuff? then you likely do not need a data warehouse-you need the transactional system. However, both the transactional system and the data warehouses require an archive under most current business scenarios.
Data that is optimized for running the business on a day-to-day basis will be modeled differently than if a business analyst wants to aggregate the revenue to understand the performance of a given brand, professional services team, or other enterprise grouping. Therefore, the data gets transformed. It gets transformed repeatedly from transactional systems to data warehousing systems and sometimes even further by tactical data marts or other downstream structures. These transformations are the backbone of the information supply chain and precisely what is being handled by ILM processes. If there was ever any doubt about the central role of data warehousing, such ideas are obsolete.
Look for Policy-based Archiving as State-of-the-Art
Different business processes imply different time horizons within which they occur. For example, a firm with three years of historical data can build a demand-planning data warehouse to forecast shipments and to reduce inventory. One consumer-packaged goods enterprise with a billion dollars in inventory reduced carrying costs by 10 percent across the board, saving enough to pay for the project and then some. A fraud-detection data warehouse, which looks for outliers and anomalous patterns in using a payment card or other financial service, requires a work in progress data warehouse of relatively brief duration months or weeks rather than years in order to flag suspicious activity. Marketing applications, such as customer cross selling, up-selling and market basket analysis, require time frames intermediate in scope between many years and a few months.
In short, policies about how much data is needed and how long vary from one application to another in areas such as marketing, sales, inventory and supply chain management. In the case of finance, government and regulatory bodies mandate that most financial transactions be retained for seven years. Details differ from industry to industry and this article is not a substitute for legal advice, which should be sought separately. In the event of a criminal allegation, legal discovery may need to go back fifteen years, though that does not necessarily imply that the technology existed fifteen years ago.
These policies and practices should be able to be represented in plain English (or other natural language) prior to being translated by a rigorous and manageable administration language implemented at the interface between the archiving system and the data warehouse or transactional system.
Although this article has not emphasized the performance benefits of archiving, do not forget them. Archiving unused (cold) data off of the data warehouse reduces size, fights data warehousing obesity, improves performance, controls administrative costs and defers the need for processor and storage technology upgrades. Legacy data warehousing vendors have claimed that the data warehouse can function as the archive. Not so. You do not archive data to a data warehouse; you archive it to a data archive. This includes transactional and BI systems as well as any related system in the ILM process.
Dont be Fooled Again
Both data warehousing and data archiving must conform to best practices for implementing data integrity and aligning with the enterprises approach to data governance and the flow of data through the enterprise information supply chain. Both data warehouses as well as the transactional systems that source them require an archive. While no client wants to hear that it may actually require two software products instead of one, in the case of data warehousing and archiving the basic business functions of each are significantly different and cannot be mutually substituted for one another. Both are needed in most business cases.
As amazing as it may sound, I saw a marketing brochure from a legacy data warehousing vendor that touted its archiving solution based on backup functionality to an automated tape silo. What was missing was the policy-based functionality that enabled individual records or object to be archived and restored. Such functionality did not exist. Such an approach might have made sense ten years ago when archive still meant backup; but today it is obsolete. Of course, modern data archiving systems continue to use tape silos as WORM optical jukeboxes. Do not be fooled by such pretenders or you might end up with a backup system when you really need the additional functionality of policy-based data archiving.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access