Free Site Registration

Data Infrastructure Hygiene and the Imperative of Organized Growth

Information Management Magazine, July 2006

Charles Garry

Corporations are not all that dissimilar from individuals. During a lifetime, we tend to accumulate things. At one time, some of these things were undoubtedly useful, but after a while, we forget why we obtained them in the first place. The result is clutter and disorganization, not to mention the difficulty of finding something amid all the "stuff."

I have a neighbor who has yet to park a single car in his two-car garage in the 10 years I have lived next door. He plans on building a shed, but I'd bet money he would only fill that up as well. Corporations are no different. They accumulate data, and lots of it. By some accounts, storage capacity requirements within the data center are growing between 45 and 125 percent compounded annually. There are a number of reasons for this huge growth. Among them are:

  • Increased dependence on email as the major form of interpersonal business communication.
  • Increased capability to perform in-depth and near real-time business analytics has driven demand for even more detailed data.
  • The use of new forms of data, such as video, photographs and voice, incorporated with more traditional, character-based data.
  • More business being conducted online means increased opportunities to acquire more customers, gather more data and the need to store and analyze it.
  • IT organizations maintain an average of seven full copies of each production environment for quality assurance (QA), development, testing, training, etc.

People and corporations typically deal with the inevitable growth of things to manage in two ways. Either you keep everything (like my neighbor) or you start throwing things out after you can't remember why you have it. I fall into the latter category, which is why my wife got upset at Christmas when she couldn't find her food processor. Oops! In this age of regulatory compliance - especially if your company is based in the U.S., the most litigious society in the world - you tend to become part of the former group by necessity.

Advertisement

Neither approach works all that well, however, so we need a third option. If we can accept the inevitable fact that data growth will happen, then companies can decide either to let the growth control it or to organize that growth so that the company is in control. The consequences for choosing uncontrolled growth are universally bad. If becoming adaptive or agile is at the heart of virtually every organization's IT strategy, then organized growth is the only answer.

Growth versus Cost

To address a problem, we must first identify that a problem exists. For a problem to exist and a solution to be found, there must be a correlation of factors. In other words, find two (or more) measurable facts that move together and are, therefore, highly correlated. For example, we all agree that when the outside temperature is greater than 90 degrees, humans will tend to sweat and become uncomfortable. If we can lower the temperature (i.e., air conditioning), we can improve the comfort level.

This leads us to the question: is there a correlation between the growth in data managed and the cost of actually maintaining that data? The answer is highly dependent on whom you ask. Certainly most PC users would likely see little correlation between maintenance cost and the amount of data under management. They have (for the most part) experienced the positive impact of larger storage capacity, coupled with improved processing speeds for a lower cost. Indeed, the average PC user now likely spends less time managing their storage than ever before. Hence, there is little correlation between the growth in the amount of data the average PC manages and the cost to maintain it.

To some degree, this mantra that storage is cheap has entered the conventional wisdom of most IT organizations. However, there is now evidence that a growing number of organizations have reached the point of diminishing returns, in which even sharply declining storage costs are no longer positively correlated with overall management costs. After all, while storage cost continues to decline 33 percent annually, the actual performance (as measured in disk revolutions per minute) has remained relatively unchanged for some time. We have overcome some of these factors through the use of larger caches and improvements in data partitioning and parallel processing (and yes, even more storage), but for larger IT shops with massive amounts of data to manage, we are now facing the unpleasant reality of unorganized growth.

The Birth of ILM

Storage vendors knew this day would come. They have seized this opportunity to expand their footprints into other areas in which they did not traditionally play — areas such as application, content and document management. Information lifecycle management (ILM) was marketed as a solution to any number of problems, including resource optimization, compliance, data protection and application performance. Organizations should accept the fact that not all data currently under management is equal, and the relative value of the data should match the relative cost of its underlying storage infrastructure. To accomplish this, experts urged organizations to create a process by which data could be assessed and classified, and then automate the movement of data to appropriate storage platforms.

I am oversimplifying the breadth of what ILM attempts to address, but suffice it to say, many IT organizations view ILM with a skeptical eye, seeing it as just another overhyped marketing ploy that would ultimately prove to be a $100 solution to a 10-cent problem.

At the very core of ILM is the notion that data is moved around the infrastructure as its relative value as "information" to an application or business user changes (i.e., as it diminishes). This process is often referred to as "archiving," which in my many discussions with end users would seem a most unfortunate term. I say this because archiving means different things to different people, even within the same organization. A storage manager might equate archiving with a backup to tape. An end user might think archiving means the data is no longer available or at least very difficult to access. This confusion, I believe, has had a negative impact on some very positive aspects of ILM.

Page 1 of 3.

Advertisement

Advertisement