Continue in 2 seconds

Surviving the Perfect Storm in Data Management

  • January 01 2001, 1:00am EST

The emergence of new data-intensive applications is resulting in the accumulation of huge amounts of digital information. Indeed, the collection of data that analysts so calmly referred to as a "sea of data" just ten years ago has now swollen to tsunami forces. As a result, one might say that today's data management professionals are facing a "perfect storm," and calmer waters are not yet in sight.

The Wave

Most analytical reports today cite exorbitant factors when projecting enterprise data growth over the next five years. For example:

  • Red Herring's March 2000 summary, "The Age of Petabytes," forecasted data growth rates of 75 to 150 percent per year.
  • A META Group analyst who spoke at a May 2000 eCRM conference projected data increases of a hundredfold within five years through the year 2004. Enterprises that are having difficulty coping with three terabytes (TB) of data today need to quickly find solutions for dealing with 300 terabytes of data tomorrow. Since the May 2000 conference, other META representatives have validated this growth factor as well as the urgent need for in-depth strategic data management planning.
  • A recent Deutsche Banc Alex.Brown data study found that e- business data will grow from 30 percent of the total data in the first year of activity (1999, in most cases) to 75 percent of the total data in the fourth year.1 This growth represents a commanding data swell of 400 percent per year.

Figure 1 integrates the data in these three studies to arrive at a seemingly dependable consensus about the rate of data growth. It uses the Red Herring data as the basis of the graph. The triangulated outlook applies to global 2000 enterprises and assumes an average starting point of three terabytes of total in-house data in 1999. The striped sections of the color-coded vertical bars estimate the percentage of growth stimulated and consumed by e-business activities.

Figure 1: Data Growth Projections

Understanding and acceptance of these predictions comes only after you consider the scope of new business initiatives and the technological capabilities that both enable and support them. New e-business applications such as Web-front management (clickstream analysis), one-to-one customer relationship management (CRM), personalization and encounter management, supply chain management, call event detail analysis and digital certification significantly add to an enterprise's existing IT-supported agenda. In addition, new nonscalar data types (objects) including images (drawings, X-rays, etc.), streaming audio and video dramatically expand the data inventory.

Surging TCO Costs

It's easy to suppose that somehow more disk storage and technological advances will ease the cost of managing this tidal wave of new data. However, as Figure 2 indicates, even with the expected continuation in RAID price/ terabyte decline, the total expenditure necessary to accommodate the projected data growth will escalate more than tenfold in the next five years.

To be consistent with the data in Figure 1, Figure 2 uses the following numbers:

  • A 1999 total storage starting point of three terabytes of serviced data, which grows at a rate of 150 percent per year.
  • $300K/TB as the starting cost of disk storage.
  • A decline in storage costs of 30 percent per year, which is the figure projected by most analysts, including IDC, Gartner and META.
  • An expenditure calculation that uses a total cost of ownership (TCO) composite, which takes into account the hardware price/TB plus overhead factors for storage and data management tools and services.

The message expressed in Figure 2 is startling. Over the next five years, given a sixfold decrease in price/TB and a hundredfold increase in data, you can expect a thirteenfold increase in total data management costs.

Figure 2: Storage Pricing and Expenditures

Finding New Harbors

Most IT professionals would agree that these soaring expenditures are unacceptable. Although the promise of Web-enabled business applications offers greater profitability, conscientious CIOs must look for, consider and embrace more cost- effective data management strategies and alternatives.

One of the first things to consider (or reconsider) is data management treatment. How do you use and effectively manage different types of data? Figure 3 classifies data management treatment into a framework with four distinct categories of information processing. Each category requires different software and technology.

Figure 3:Data Management Framework

First, there's active versus supportive data. You must keep active data fresh and current because it's used by operational procedures. In contrast, you may derive supportive data and refresh it periodically (for example, nightly or weekly). Informational and analytical applications use supportive data.

Next, there's high-concurrency versus high-volume storage. High-concurrency storage typically consists of rotating disk memories that service high amounts of simultaneous access to small amounts of data, measured in transactions per second (TPS). High-volume storage typically deploys multiple media (disk, optical and tape) and services small amounts of simultaneous requests for large volumes of data, measured in gigabytes per minute (GPM).

In most cases, the majority of enterprise data does not need to be maintained in an active profile on high-concurrency storage media. As Figure 4 indicates, as little as 15 percent of the total data resource may be all that is required, depending on applications.

Figure 4: Data Allocation Framework

Aging details such as clickstream logs, order entry line items, call event detail records, inactive account descriptions, audit summary backups and many other file segments that do require subsecond retrieval rates can be allocated to more cost-effective, high-volume storage.

Charting the Course

Analytics have always been key to successful business endeavors. However, in the e-business arena, the time element for many applications is so critical that enterprises can no longer afford the luxury of offline decision making. Many of the manual processes that previously depended on decision support systems (DSSs) must now be automated.

META Group's infrastructure recommendations at the May 2000 eCRM conference further endorse the active versus supportive, high-concurrency versus high-volume storage framework. For example, META suggests that e-CRM systems should be bolstered with two levels of analytics: real time (active) and batch (supportive). META refers to this split as micro- and macro- analytics, respectively (refer to Figure 5).

Figure 5: Two- Tier Analytics

Within the batch portion of the storage framework, long- running macro data mining applications culture attributes and triggers that are posted in an account master record in an online operational data store (ODS). In turn, the real-time micro-analytics run on top of the ODS like a state machine and react in accordance with the derived triggers and attributes. For example, a reaction to a stored trigger might be a notification message about new product offerings that parallel a customer's previous buying trends. This delegation of process has two areas of positive impact. Shifting the macro- analytics to the supportive/high-volume data management environment improves the real-time performance factor. Furthermore, transferring the high-volume detail data to a more economical storage platform realizes tremendous cost savings.

Calmer Waters

Figure 6 depicts a vastly improved cost projection based on using the proposed high- volume/high-concurrency data allocation framework and the two-tiered analytical model. These projections include the integration of an alternative data management solution. This type of solution satisfies the high-volume requirements of the proposed data management framework in Figure 3 by providing data recovery and deep volumes of atomic detail data for supportive analytics, all in one cohesive strategy.

This alternative management solution accommodates high-volume needs with RAID media and cost-effective robotic storage (automated seek-and-play elements that include optical disk jukeboxes and high-speed automated cartridge tape libraries). It also provides sophisticated storage management and relational database management software. This software automates critical system management tasks such as data migration, backup and recovery as well as row/record level selectivity regardless of database size or location of data within the storage hierarchy.

In Figure 6, the broken line represents the TCO mapping initially described in Figure 2. Contrast this line with the actual cost of storing high-volume data on a more appropriate platform. As you can see, by taking the same volume of data ­ but now delegating it to the proposed subsystem alternatives ­ you can achieve tremendous cost savings.

Figure 6: An Economic Alternative

The example in Figure 6 keeps 30 percent of the data on more costly high-response RAID technology and migrates 70 percent of the data to a high-volume alternative technology.

Although this scenario is somewhat conservative, it represents a savings of more than $160 million over the charted five-year period. Every additional one percent of data shifted from high-concurrency storage to alternative storage by year 2004 will represent even more savings.

Surviving the Storm

Much like nature's "perfect storm," several forces in the IT world are converging to form a magnitude of data management problems that transcend previous levels of experience. Both the supply and demand sides of the business information equation are escalating together at a whirlwind pace. New-generation e- applications are producing torrents of data at a predicted hundredfold, five- year growth rate. At the same time, enterprise managers continue to thirst for more insights that can only be gained from analyzing these massive amounts of accurate and timely detail data. Add to this turbulence an overwhelming projection in the associated cost to contain and manage these torrents of data, and it becomes apparent that new alternatives need to be considered.

1 Dolan, Timothy J., C.F.A. "eCRM: The Difference Between Winners and Losers in the e-Business World of the 21st Century." Deutsche Banc Alex.Brown. North American Equity Research/US Enterprise Software. September 15, 1999.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access