Database archiving is becoming an important new topic for data managers. The need for this function has surfaced at most IT organizations, and the problems it addresses are only getting bigger and bigger. These problems include challenges with data retention requirements, application renovations and e-discovery. Most IT data managers recognize the problems but many do not associate database archiving as a solution. This will change as the technology matures and spreads.
Database Archiving is the practice of removing selective business that are not expected to be referenced again records from operational databases and storing them in a separate archive data store where they can be retrieved if needed.
In essence, it partitions the application database into the operational database (current business records that are still of value to the business) and the archive data store (inactive business records that need to be retained but that have no expectation of being used again for business purposes).
You don't archive databases; you archive business records from databases. Database archiving is an electronic-form of records retention.
For example, a banking application has an operational database containing data for transactions (such as deposits and withdrawals). As data for a single transaction ages, it reaches a point where all intended or expected business uses of the information have been accomplished. The business record includes all data relative to the transaction, including reference information pointed to from the transaction. For example, customer name and address may be copied from the customer master record in order to complete the business record that is moved to the archive. 
The timing of when an instance of the business record type is ready to be moved is determined by a policy set by the archive designer. This policy may be simple (90 days after create) or complex (one year after create unless the account is flagged as under review or the account has a negative balance).

Figure 1 shows the phases of the life of a business record as it pertains to database archiving. A business record gets created and remains in the operational state as long as it can be updated or participate in the creation or updating of other data records. For example, the banking transaction would be operational from the time it is created until it updates the customer's master account record, creates a record for the financial system and possibly gets updated with a flag indicating it has been audited.
Data is in a reference state if it can no longer be changed or create other data records or change other data records but is still expected to be used for other, read-only purposes. This includes report generation (detailed or summary), extract processing for business intelligence data stores or anticipated customer inquiries. 
Data often reaches a state where all changes are final and all expected reference uses have been accomplished. This is the inactive state. The bank no longer needs this transaction. However, the bank may be required to keep the data available for many more years to satisfy government regulations, or the bank may chose to keep the data for a longer period for un-anticipated uses. Generally, the time required by law for retention exceeds the time the bank would prefer to keep the data.
Data in the inactive state usually does not need access to the application programs. Accessing the data for any unplanned uses can be accomplished through simpler generic query and reporting tools. The data can safely be separated from the application and operational environments. 
Some business record types become inactive and can be archived almost as soon as they are generated. Other types may never reach a point where they can exist independently from the operational environment. However, most applications have transaction data where the data can be safely moved to a database archive for 80 to 90 percent of their required retention period.
The data lifecycle and the ability to achieve application and system independence determine the suitability of the data for database archiving. If it qualifies, then as much as 90 percent of the operational data can be offloaded from the operational systems and retained in an archive data store that is cheaper and more efficient for managing inactive data.
Database archiving is a subset of the larger topic of data archiving. Data archiving includes separate technologies for file archiving, document archiving, email archiving and database archiving. The purpose of archiving and the generic model of the archiving process is pretty much the same for all of them. However, the actual implementation of database archiving is hugely more complex than it is for the other forms of data archiving. It is also in the early stages of evolution, whereas other forms of data archiving have been around for a long time and the best practices and tools have matured for them.

Problems Addressed by Database Archiving


Adding database archiving to the data management practices of an application is a significant move. It is added work to design and implement as well as to provide continuous administration of over the life of the data. In order to justify this, there must be significant problems with the operational environment to make it worthwhile.
The driving factors that cause the problems are longer retention periods, application renovation and the increased focus on electronic data for e-discovery.
Longer required retention periods have resulted in overloaded databases. Regulations passed in recent years have created retention periods often measured in decades. For many applications, these new requirements mean that no data will be discarded from applicable databases for years. The databases will simply grow and grow. They grow because you cannot discard data, because your business expands and because you merge data into them from companies you acquire. Growth is multidimensional and exponential.
Databases are reaching unmanageable size already, and the impact on operational systems is challenging database administrators. The telltale signs of a database that could benefit from database archiving are continuous upgrading of operational systems with attendant hardware and software costs, lengthening time periods required to run backups, reorganizations, recovery and data extracts. Disaster recovery times are also lengthening, even though most administrators are not monitoring this effect. When they do they quickly become disturbed at the time that will be required to execute a disaster recovery.
Databases unnecessarily bloated with inactive data are also harder to tune and keep tuned. The inactive data is splattered throughout the database and interferes with all attempts to achieve a high state of performance.
The impact is on cost and operational efficiency and can be huge.
Application renovation presents the next biggest opportunity. Most people do not associate database archiving with application renovation. However, it can play a significant role in reducing cost and in shortening implementation time of such projects.
Historically, when an application undergoes a significant overhaul, the old data is forced to fit into the data structures of the improved system. This forcing of data created under one structure definition to one created under another can be costly and compromise the integrity of the old data. A better way to handle many of these cases is to retire the old data under the old definition and archive it with its old definition (metadata) when it all becomes inactive. The older data is not moved to the replacement system.
This approach is much cleaner and more accurate. Data does not have to be mangled in the process of force-fitting it to the newer system definition. The renovation project has complete design freedom in not having to accommodate the old data.
This approach makes it easier to replace legacy applications running on legacy systems, easier to merge databases acquired through acquisitions and mergers, and easier to accommodate major application overhauls.
E-discovery is creeping into the database world. The lawyers have discovered this source for investigation and are increasingly calling on IT shops to produce data and to maintain a legal hold on it for the life of the litigation. A good database archiving practice can help guard against e-discovery failure risk by preserving the integrity of data throughout its lifecycle, making it easier and more accurate to find data, and using the archive to hold the data delivered.

A Business Case


The essential elements of any IT business case are cost reduction, improving operational efficiency and risk reduction.
For cost purposes, Figure 2 shows the relative value of database archiving for large databases with data having a long inactive period. This is a very common case.

It is relatively easy to build this chart once the lifecycle chart is produced and costs collected for maintaining data in operational and archiving data stores. This chart is convincing in winning approval for archiving.
Figure 3 shows this same chart for databases that have been orphaned due to application retirement. This may be the result of an application renovation, application replacement or database consolidations. The cost of retaining a full operational environment for inactive data is not competitive with archiving the data and dumping the old environment.

What are People Doing?


If you visit most large IT shops with formidable database applications you can easily find awareness of the problems created by retaining inactive data for years in the operational databases. Most data managers also quickly recognize that removing inactive data to somewhere else is a good thing to do (a must-do move, in many cases).
This has led to a large number of homegrown solutions where the data managers believe they can solve the problems without a formal database archiving effort by using available tools.
Homegrown solutions take many forms: building parallel lookalike databases, saving image copies of operational databases, moving inactive data to database unload files and other more creative solutions. The common theme of all such solutions is that they solve the immediate problem (getting inactive data out of the operational databases) but leave many other problems unsolved. Often they introduce even more problems to the back end. Some common mistakes made in homegrown solutions are:

  • Not saving metadata making data not understandable in the future.
  • Not enhancing metadata.
  • Not separating data from the application environment.
  • Requiring that data be reloaded into the original database definitions to use.
  • Putting data in a form that is not directly searchable.
  • Providing less protection of data in the archive than it had before.
  • Not managing the media in the archive exposing it to media rot.

Figure 4 shows the generic model of any solution that moves inactive data to another data store. All of the functions shown are needed to do it right. The homegrown solutions concentrate only on the extract components and either ignore or do a poor job of the other functions.

So why are so many shops implementing solutions that are seriously flawed and place the archived data (and the corporation) at risk?
There are multiple reasons. One is that data managers are not educated in nor have experience with formal database archiving practices. They do not know how to approach the problem with a best practice. 
A second reason is that the vendors providing consulting services or software tool solutions are too few and their technologies are too immature. The industry has only a handful of vendors in the space and their tools and methodologies are a long way from being classified as best practices. In their defense, they are all getting better and more time and experience will certainly yield tools and methodologies that will be compelling to use.
Another reason is that most data managers do not know how to generate the business case for their own IT management that makes investment in staff, education and tools for a database archiving practice compelling. 
Most business case attempts start with making it a compliance or governance issue. This will never generate the impetus to fund a full database archiving practice. Building the business case on cost reduction, operational efficiency improvement and risk reduction will make it a no-brainer.
Another misconception is that a formal database archiving approach is too expensive and that the homegrowns are good enough. This is almost never true. In most cases the homegrown approach is more costly to implement, fails to deliver cost reductions or risk avoidance and introduces new headaches for IT. This is a case where doing it right is cheaper, faster and delivers more benefits.

What Should You Do?


It is worth every IT shop's time to investigate database archiving as a new data management function that can be a major factor in moving their data management practices to a new level.
IT data management professionals need education about database archiving technology. Each IT shop should have at least one expert on the topic: the database archivist. They need to understand the implementation options that are available and emerging. They need to conduct a survey of their database applications and determine the applicability of database archiving to each and to assess the business case value of implementing each of them. They then need to venture into their first implementation. If done right, the effort can be worth millions of dollars in cost savings, improved productivity of operational applications and reduced corporate risk.
Data managers are not the only ones who should be looking at this technology. Compliance officers, data governance committees, business unit legal staff and others should be aware of the power of database archiving for creating and maintaining a more professional data management environment that is more appropriate for the new world of regulations, litigation, corporate transparency and accountability.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access