Rapid storage growth across the enterprise, both inside and outside the data center, and new business requirements, are straining traditional data protection approaches. This article examines these challenges and investigates how a new disk-based backup technology called data deduplication can be an effective tool for solving pressing backup and recovery challenges.


Coping with a Sea of Data


Enterprise backup policies haven’t evolved all that much in recent years. Backup data is still, for the most part, written to magnetic tape each night, duplicated and then sent off-site to meet disaster recovery needs.Of course, disk already plays a role in enterprise data protection – either by providing a temporary stage for data, before it goes to tape or by supporting snapshots related to data protection – but most data still goes to tape.


As more companies move to 24x7 business cycles, and as the amount of data to protect grows rapidly, the notion of a long period of downtime for applications and corporate systems appears almost quaint.


Some statistics from research firm TheInfoPro indicate the severity of the situation. TheInfoPro found that, among Fortune 1000 companies, average storage capacity grew from 362 terabytes in early 2005 to 1,013 terabytes in Spring 2007 (67 percent compound annual growth per year).1 New disk-based backup technologies can help companies address the backup challenges that come with rapid storage growth and increased recovery expectations, but companies should also expect to use a combination of disk-based technologies.


“Backup has emerged as the leading area of focused improvement for Fortune 1000 storage organizations in 2007,” wrote TheInfoPro Managing Director Robert Stevenson in a December 2006 report. “Front-end storage growth, regulatory compliance and increasing data retention times have created a need for backup innovation to maintain the highest levels of data protection.”

Enterprises assessing their data protection infrastructure would do well to begin by asking the following questions:


  • Do you know how much money you spend to protect different types of data and whether or not your most important data has the highest level of protection?
  • How quickly can you recover different applications as well as data in the event of human error, system failure or disaster?
  • Is data at your remote offices protected consistently?
  • Do you frequently test data recovery as well as your state of disaster recovery readiness?
  • Can you quickly retrieve data such as files or email from online and offline sources in audit situations or recovery emergencies?
  • Can you effectively demonstrate and report on protected data across business units, locations and applications?

While few organizations can answer all of these questions affirmatively, too many negative or ambiguous responses present a strong argument for revising your data protection infrastructure and policies and considering data deduplication technology.


Flexibility from the Remote Office to the Data Center


Data deduplication technology is being deployed in both the remote office and in the data center to help companies centralize their backup data, reduce storage costs and improve both local and disaster recovery efficiency. Data deduplication eliminates redundant backup data at a block level across multiple backup sets and locations, making file names, attributes and physical locations irrelevant.It can function at the start of the backup process, before data ever leaves the server, to reduce the network bandwidth required, or it can be deployed behind a backup application as a disk-based storage target.Both approaches can offer storage efficiencies that range from 10 to50 times and even greater bandwidth savings.


Where and how to deploy data deduplication technology will depend on the user’s environment and recovery needs.With client-side deduplication (or front-end), you place a client (i.e., small amount of software) on a server in place of a traditional backup agent. This client eliminates duplicate backup data before sending it across the network.This bandwidth efficient approach is ideal for systems in remote offices, virtual environments (e.g., VMware ESX servers), or smaller offices with limited bandwidth.And unlike a traditional backup approach where backup streams flood the network, limiting the scheduling of backup jobs, client-side deduplication enables you to run many jobs simultaneously because a given backup job needs anywhere from 10 to 500 times less bandwidth.Companies can leverage client-side deduplication for bandwidth constrained backup systems such as those found in remote offices and virtual environments to increase the reliability of recovery, to centralize backup data and reduce storage costs.


Storage-based deduplication is gaining popularity in the data center as a way to help manage storage growth, recover data from disk, and expand disk-based disaster recovery to lower tiers of data. Unlike client-side deduplication, storage-based deduplication normally does not require any changes to your backup software.It can be deployed as a hardware storage appliance or as a software solution that uses your own combination of servers and storage. Both approaches reduce the size of a backup after it has streamed across a network and through a backup server.Most customers have fewer concerns about backup bandwidth in the data center.The storage efficiencies are the same as with client-side deduplication and typically reduce aggregate storage required for a backup (over a given retention time) by 10 to 20 times, when compared to tape. For data with short retention periods (e.g., one to four weeks), using deduplication can be especially appealing because it can improve the reliability of recovery and help companies control tape-related costs in the face of growing data volumes.


With deduplication, the traditional concept of a full backup or a full copy disappears.Instead, every new backup (or copy) relies on the previous copy to ensure that only the unique blocks are stored.Of course, a full backup can be recovered at any time – even if you haven’t written a typical full backup to disk or tape.Hardware redundancy (e.g., RAID 5) protects most systems, but customers need to remember to make backups of their dedupe storage system, and not just replicate a copy, to add an important layer of protection.


How Can You Get Started with Deduplication?


Should client-side deduplication or storage-based deduplication replace all your current backup methods and media?Most likely not, because most environments require different degrees of protection and not all types of data are good candidates for deduplication.In other words, recovery point objectives (RPOs) and recovery time objectives (RTOs) should be matched to data protection methods. More recovery points might be better served by snapshots or continuous data protection (CDP).Faster recovery might be better served with snapshot methods or a SAN-based backup to high-speed disk.And recovery requirements for data often change as the data ages.Finally, not all data types dedupe well, in particular compressed file formats used for music, photos, medicine or research.For example, if a dedupe storage system cannot recover large amounts of data (or millions of small files) quickly, a user might instead choose to store backup data on high speed disk for the first week and then move this data to a dedupe storage target for the remainder of the retention time.The underlying principle when considering dedupe technology is to evaluate the benefit and limitations of each approach alongside your RTO and RPO objectives.


Customers should deploy data deduplication based on specific needs within their environment.Controlling backup storage costs is an obvious imperative for many companies, but eliminating distributed tape-based backups may also present a cost-saving opportunity.Here are a few points to remember as you consider where to use this technology.


  • Assess where you can use this technology (e.g., client-side and/or storage-based) across all offices, data centers and systems – including virtual environments.
  • Evaluate how deduplication technology fits within your existing data protection environment (backup applications, storage, and servers).
  • Examine the trade-offs between storage appliances and software-based solutions for dedupe storage.
  • Determine whether you will need to export data out of your deduplication storage system to tape.
  • Remember to factor in the protection of your dedupe storage system (from data corruption) as part of your overall architecture.

According to some experts, data deduplication can reduce total backup storage usage by factors of 10:1 or more (depending on the nature of the data) when compared to traditional backup methods to tape. The bandwidth reductions delivered by client-side data deduplication technology are even more significant.When deployed as part of the overall backup strategy and used alongside other backup methods and media, there is no doubt that deduplication can bring both operational and economic benefits to companies.



  1.  Robert Steveson. "2006 Storage Wavw Study." TheInfoPro: 2006.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access