Continue in 2 seconds

Policy-Based File System Archiving: A Smarter Strategy for the Deluge of File System Data

Published
  • September 01 2005, 1:00am EDT

Today, companies are faced with a never-ending deluge of information and increasing competitive pressures. Consequently, they must strive to lower costs, improve asset utilization, drive productivity and optimize information movement. By implementing an information life cycle management (ILM) strategy, companies will be able to get the maximum value from all their information at the lowest total cost - including file system data.

File systems include many forms of digital information (such as electronic documents and images) that are retained and accessed through file systems. Organizations are seeing explosive growth of storage requirements for file systems in response to end-user productivity needs. They are also faced with regulatory compliance and litigation discovery requirements for record retention and retrieval.

File systems contain all data that exists in files that are not managed by databases and other applications such as e-mail. Residing in file systems on desktop and mobile computers and corporate file servers, file system data commonly includes Microsoft Word, Excel and PowerPoint files, as well as hundreds of others including HTM, JPEG and PDF files. These files are typically created and managed by end users and are commonly copied and moved within the IT network for sharing purposes.

End-user productivity is a major business driver for file system data growth. Employees are accustomed to managing their day-to-day work with messages, reports, presentations and spreadsheets. To manage this explosive growth of file system data, companies often restrict file storage on corporate servers by imposing storage quotas on users, effectively pushing these data files back onto the employee desktop machines. Unfortunately, data files on personal and local file systems are less likely to be backed up or preserved for productivity and organizational memory. In addition, organizations have also attempted to solve the data growth problem by simply deleting older files to make room for new ones, which reduces productivity by forcing users to manually copy their files to alternative locations.

A second driver of file system data growth is regulatory compliance, which, combined with business productivity, leads end users and organizations to retain more file system data. Compliance requirements can also prevent near-term deletion of affected data files once they are created.

The third driver of runaway file system data is litigation discovery. Electronic records are increasingly targeted by court subpoenas during pre-trial discovery proceedings, and the cost of complying with a single discovery request can quickly run into many thousands of dollars if the records must be retrieved from old backup tapes and desktop file systems.

User Problems

In addition to the aforementioned drivers, there are other factors that affect the management of file system data. As file system data files are created and stored on file servers, file systems must be enlarged and more disk space must be provisioned and allocated. And, in many organizations, file system data files are kept on relatively high-cost disk storage. Consequently, companies are faced with higher acquisition costs for the additional physical storage, as well as increased management and overhead costs.

Then, there's backup. When kept on primary storage, historical files are generally included in full backups along with the more recent files - even though they have not changed and have been backed up many times before. This increases the overall backup time and cost, the length of recovery times and the risk of recovery failures. Many organizations have already reached the point where their backup windows do not allow enough time to complete full backups.

Policy-Based File System Archiving

A practical solution to the file system data problem must embrace and enable data growth while optimizing storage costs and meeting retention and access requirements. The solution must also work seamlessly with existing file servers and the resident file systems to enable easy assess to data, as well as preserve these files by keeping them in a protected central repository that allows full access for authorized users, administrators and regulators.

This can be accomplished through the use of a tiered storage architecture. Tiered storage places critical, timely files on high-performance RAID or NAS while access needs are high, and then seamlessly moves them to secondary storage when access needs decrease. A tiered storage infrastructure is made up of multiple types of disk (fibre channel, SCSI, ATA), tape and optical storage. Policy-based file system archiving matches the right data to the right storage at the right time using customer-driven business rules. This is a core component of an ILM strategy.

With policy-based file system archiving, administrators can define rules to identify files to be moved to a more appropriate level of storage. File candidates might include data that is inactive or that needs to be archived for compliance or governance requirements. Additionally, policies can be established to move files between storage tiers over the lifetime of the data. Customers can precisely target files better suited to reside on alternate storage tiers as well as manage data placement across storage tiers to match service levels. As a result, organizations can effectively manage data movement, access requirements, storage costs and capacity needs. File system archiving can also accomplish data movement in a manner that is transparent to end users. Files are viewed in the same file system location as prior to migration and are accessed in the same manner. Upon migration, files on primary storage are replaced with small file stubs that point to the entire file at its new location. As a result, file systems are effectively relieved from the growing burden of inactive data files consuming high performance disk resources, thereby reducing storage costs and improving backup and recovery speeds, while maintaining end-user productivity.

Tiered Storage Savings

To realize the full benefits of a file system archiving solution, IT organizations need to understand the physical cost savings associated with alternative storage tiers, a reduction in backup and recovery time, and an overall savings in storage management overhead. Primary disk storage can cost at least $50 per gigabyte for high-end RAID storage systems. In contrast, low performance disk costs as little as $10 per gigabyte for ATA disk storage. Tape storage remains the least expensive and costs as little as $1 to $4 per gigabyte. Enterprises can reduce storage costs by automatically moving select files based on policy from high-cost disk to low-cost disk. For example, if 5 terabytes of file system data is placed on high-end storage at a cost of $50 per gigabyte, it is consuming $250,000 of prime storage real estate. If 80 percent of the data (4 terabytes) is inactive, the IT organization can save $160,000 in storage investment by moving that 4 terabytes to low-end disk at a cost of $10 per gigabyte.

Considerable savings in backup and recovery time can also be achieved by policy-based file archiving. Using this example, assume that it normally takes 10 hours to backup the 5 terabytes of data on high-end storage. However, if 4 terabytes of inactive data is moved to secondary storage, the primary backup volume will be reduced to approximately 1 terabyte (active data plus stub files). This reduces the backup time from 10 hours to two hours. Conversely, if it took 14 hours to restore the 5 terabyte file system, moving the inactive data to secondary storage would reduce the restore time to 2.8 hours. In the event of a restore, only active files and stub files need to be rewritten to primary storage to gain full view to the entire 5 terabyte file system.

Tape costs are driven directly by the number and size of the files in each backup data set. Consider, for example, a full backup of 5 terabytes of file system data that consumes $10,000 worth of tape media ($2 per gigabyte tape cost). For routine weekly backups with a standard six-month retention schedule, the annual tape cost would be $260,000. If 80 percent of the file system data files that are inactive are migrated, the full backups of the remaining active data (1 terabyte) would require less tape media and cost only $52,000 per year of backup data, a cost saving of $204,000 per year subject to tape recycle/reuse policies.

Smarter Strategy

A policy-based file system archiving solution within a tiered storage infrastructure offers a smarter strategy for managing growing volumes of file system data. This enables organizations to fulfill ever-expanding access requirements, meet legal and regulatory data-retention requirements, and realize tremendous savings in storage TCO.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access