The Cost of Storage

  • April 01 2004, 1:00am EST
There is much confusion concerning the cost of storage. Add to the mix vendors who have an agenda that actually deliberately confuses the issues, and the result is quite a conundrum. What is the cost of storage? The problem is that there are many costs of storage.

The first and simplest cost of storage is that of the cost per megabyte (or cost per whatever measure seems to make sense). Vendors love to provide comparisons for storage mediums on a per-byte basis. Unfortunately, comparing the cost of storage at the bit level is almost meaningless because storage does not exist by itself. In order to be useful, storage requires an entire infrastructure of controllers, communication lines, caching, software and several other incidental attachments including the processor required to use and drive the storage units. In the long run, the infrastructure surrounding the storage is more expensive and important than the storage itself.

What about the day-to-day operational costs attached to the usage of the storage? A research firm recently declared that for every one dollar spent on acquiring storage, seven dollars were spent annually on the management and operations centering on the usage of the storage. This makes the acquisition cost of storage almost trivial. The real cost of storage is that of the required operations on an ongoing basis.

There is another very important cost of storage as well -- the cost of having to replace an entire technology. What happens if you buy from a storage vendor that goes out of business or merges with another company? This up-ends your support, undoes your migration paths and short-circuits your relationships. You suddenly find yourself doing business with a company you did not choose. Now what is the cost of storage?

Each of these scenarios is very real; there is nothing hypothetical about any of them. Furthermore, each one of these scenarios obliterates the previous scenario. The unit cost of storage is trivial compared to the infrastructure cost. The cost of infrastructure acquisition is trivial compared to the ongoing cost of operation. The ongoing cost of operation is trivial compared to the cost of having to replace an entire technology.

It is for these reasons that the cost of storage is a bewildering topic. All of these scenarios have merit, and all are related to each other. All of these scenarios are legitimate, all show just how confusing the world of storage costs can be. The truth is that with any of these scenarios, you can make an economic case for just about any storage vendor with any kind of storage technology.

Further complicating matters is the fact that different kinds of processing have very different patterns of access to storage. Online processing requires that storage be accessed randomly and quickly. For years, organizations have built online systems, and with those online systems have come mountains of disk storage. In fact, the hardware vendors have become so accustomed to selling disk storage that they assume this is what everyone wants. Many organizations have never built systems on anything but disk storage. However, in today's world, there has been a surge of systems that are not online but require massive amounts of storage. In particular, as data warehouses grow large, they require massive amounts of storage where the probability of access is moderate to low. In a very large data warehouse, data divides itself into two classes -- actively used storage and inactively used storage. Placing inactive data on high-performance disk storage is a waste of resources; a better alternative is near-line storage.

In archival processing, there is an even lower probability of access than that for inactive portions of the data warehouse. Given the characteristics of archival processing, disk storage is hardly the optimal form of storage. The type of processing has a great effect on the storage medium that is optimal and most cost effective.

There are other confusing factors as well. Consider the statement, "Why should I worry about storage costs when I build a data warehouse? After all, storage costs are only a fraction of the total cost of the data warehouse." This is an absolutely true statement for early-stage data warehouses because costs include: ETL (extract, transform and load) processing and transformation, data modeling, initial loading and population of data, methodology, hardware break-in and database management system (DBMS) selection. It is also true that in early-stage data warehouses the volume of data is measured in small amounts. However, data warehouses grow in size. As a data warehouse matures, the start-up costs remain constant while the cost of storage skyrockets. It is a shock to many organizations that as the data warehouse grows, the costs of storage become the most pressing and urgent costs.

Given all of these factors, it is no wonder that determining the relative cost figures for storage -- especially in the world of data warehousing -- is a complex subject.

