Data centers are increasingly coming under scrutiny because of the high economic and environmental costs associated with maintaining them. Over the past five years alone, data centers have doubled the amount of energy they consume, and a recent report issued by the EPA projects this consumption rate to grow to 2.5 percent of the nation’s total energy usage by 2012.

The environmental impact of maintaining a data center is stunning. A single rack of storage enclosures using 10kW generates as much carbon dioxide as six 1999 Chevy Tahoe SUVs in one year (about 60 tons). Most storage systems, which are composed of anywhere from dozens to hundreds of disks that are always on regardless of the need to retrieve data, represent the equivalent of an SUV perpetually idling in the garage on the off chance the owner might want to take a drive.

To alleviate this looming ecological crisis, companies across myriad industries are exploring different approaches and technologies that will enable their data centers to operate more efficiently. Today, there are an increasing number of options for companies seeking to mitigate the environmental impact of their data centers and reduce the carbon footprint of their information. These include:

MAID

Unlike traditional file systems, which require all drives under management to be spinning, with massive array of idle disks (MAID), a disk that is not being used is turned off. This not only reduces power consumption, but also prolongs the lives of the drives and lowers cooling requirements. MAID also offers economic benefits - a MAID system can have hundreds or even thousands of individual drives, offering mass storage at a cost per terabyte roughly equivalent to that of tape.

However, there are pitfalls with MAID, especially when integrated with the most common file system platforms. File systems tend to spread data and associated metadata across as many disks as possible in a given storage environment – eliminating power savings even with fairly small input/output loads. Therefore, any power-managed storage architecture must be fully aware of the underlying on-disk data requirements of the controlling file system operating environment. Other limitations include lower throughput than conventional disk arrays, longer latency times, less redundancy, and greater physical bulk and mass.

Thin Provisioning

Thin provisioning is a technique that uses overallocation to reduce wastefulness on storage servers. It mainly applies to large-scale centralized computer disk storage systems and allows space to be easily allocated to servers as needed.

Overallocation is a mechanism that allows server applications to allocate more storage capacity than has been physically reserved on the storage array itself. A logical unit number (LUN) is created that is larger than the amount of physical storage that is actually available. This allows flexibility in the size of application storage volumes, without having to accurately predict how much a volume will change. Thin provisioning relies heavily on automation to map LUNs and then create or resize volumes, reducing the management overhead typically needed for provisioning tasks.

As actual space fills up, more physical storage can be added to meet additional space needs. This saves money since customers buy more storage only as needed, and ensures maximum storage utilization because very little of the LUN's disk space is left empty.

However, there are some downsides to thin provisioning. The complexity of keeping tabs on actual storage consumption may outweigh the benefits of this technique altogether. If storage is not added to the thinly provisioned LUN in time to meet storage needs, applications can crash. Also, while thinly provisioned volumes can grow fairly easily, they remain extremely difficult to shrink without using the newest operating systems. Because thinly provisioned LUNs come from a pool of common storage, it is possible for multiple LUNs to overlap the same disks, potentially creating performance problems as applications compete for drive access.

Storage Virtualization

Storage virtualization provides a way to combine multiple drives into one centrally manageable resource. Using virtualization, multiple independent storage devices appear to be a single monolithic storage device, which can be managed centrally. The virtualization system presents to the user a logical space for data storage and handles the process of mapping it to the actual physical location.

Virtualization saves both time and money in addition to improving utilization. By pooling available storage capacity, system administrators no longer have to search for disks that have free space to allocate to a particular host or server. Virtualization also helps the storage administrator perform the tasks of backup, archiving and recovery more easily - and in less time - by disguising the actual complexity of the storage system. A new logical disk can simply be allocated from the available pool or an existing disk can be expanded. However, perhaps the greatest appeal is that virtualization allows for the migration of data between drives while maintaining concurrent I/O access, thus minimizing downtime.

Despite the benefits that storage virtualization can provide, it has its shortcomings. Virtualization can be a complex process with multiple risks associated. For example, problem determination and fault isolation can become challenging due to the abstraction layer. Once the abstraction layer is in place, only the virtualizer knows where the data actually exists on the physical medium. Even more problematic is the fact that, once virtualized, if the metadata is lost, so is all the actual data, as it would be virtually impossible to reconstruct the logical drives without the mapping information. Due to the nature of virtualization, the mapping of logical to physical requires some processing power and lookup tables. Therefore, every implementation will add some small amount of latency.

Data Compression

Data compression is the process of encoding information through specific encoding schemes, using fewer bits than an unencoded representation would require. Compression transforms a string of characters into a new string that contains the same information but whose length is as small as possible. Wire-speed compression is high-performance, on-the-fly compression at the file system level, which can compress user files by as much as 50 percent.

Compression greatly improves I/O and replication performance and helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. But compressed data must be decompressed to be used, and this extra processing may be inconvenient and require expensive hardware and/or extra time. Thus, the design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced and the resources required to compress and uncompress the data.

Deduplication

Data deduplication is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media, while subsequent instances are just referenced back to the one saved copy.

Data deduplication requires lower storage space requirements, which saves money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time and reduces the need for tape backups. Data de-duplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.

A potential problem with deduplication is hash collisions. Hash codes, unique identifiers for block data, are used by deduplication software to compare stored data with new data. When a piece of data receives a hash number, that number is then compared with the index of other existing hash numbers. If that hash number is already in the index, the piece of data is considered a duplicate and does not need to be stored again. If the hash number appears new, it is added to the index and the data is stored. In rare cases, the hash algorithm may produce the same hash number for two different blocks of data. When a hash collision occurs, the system does not store the new data because it sees that its hash number already exists in the index. Hash codes algorithms must therefore be robust enough to prevent hash collisions and also be indexed in such a way to provide high performance deduplication regardless of the amount of stored data.

The Future of Data Storage

There is no question that as the demand for storage continues to rise, along with energy costs and environmental awareness, companies must find a way to be more energy efficient, cut costs and increase data storage capacity. While each of the methods noted above can help achieve these goals, none by itself is the answer.

The key to a greener future for data centers may be found in storage solutions that offer a more comprehensive approach to achieving efficiency. These data storage technologies integrate and modify synergistic software technologies such as deduplication, MAID, and compression into a single holistic package to deliver a solution that is efficient, cost-effective and scalable.

By integrating the best of the technologies discussed, these next-generation solutions enable companies to to consolidate and virtualize their storage, reduce their power consumption, lessen waste, increase storage density and lower costs – in short, to achieve the true promise of a greener data center.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access