SEP 1, 2006 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Driving Business Value with ILM-Enabled Database Archives

Print
Reprints
Email

The explosive growth in the size and quantity of databases is a well-known phenomenon in virtually every vertical industry, making integration of database archives into enterprise-wide information lifecycle management (ILM) strategies a daunting challenge for many administrators. The increasing use of enterprise software, the Internet and Web-based commerce, as well as the proliferation of digital media mean that database growth is a fact of life regardless of industry type or use case. The phenomenon applies to structured and unstructured data in both transaction processing and data warehouse environments, and impacts the executives, managers and business analysts for whom timely access to data at an acceptable cost is increasingly difficult to guarantee.

Because of this explosive growth of data, enterprises are facing high primary storage costs and an increasingly paradoxical dilemma: while business, regulatory and compliance requirements demand more complex and increasingly rapid analysis of this growing data hoard, the access problems and costs of storage and retrieval have made it necessary to offload more of the data burden to an archive, particularly in data warehousing environments. Analysis and reporting then become functions of how fast and how accurately archival data can be retrieved and subjected to analysis. Unfortunately, the state of the art in archival storage and retrieval mandates that both accuracy and speed be sacrificed as the use of archival database alternatives grows.

Many organizations are reviewing tiered storage strategies to migrate less frequently accessed data to the lowest-cost storage devices using policy-based automated storage migration, commonly referred to as ILM. Advances in database technologies have allowed for critical database information to remain on fast-access primary storage while less frequently accessed database information is migrated to an archive on near-line storage systems. However, what is often not thought through is the impact of searching these near-line stores when data is required, either in response to an unexpected question or as part of a less frequent but nevertheless critical business cycle. One of the challenges ILM presents is providing convenient access to information in the database archive after the information has been compressed and archived.

IT staffs and BI users are just now beginning to recognize the challenges faced by this ILM database dilemma. The growth of data warehouses has begun to reach an important critical juncture: for many users, multi-terabyte data warehouses are creating a barrier to effective analysis and business intelligence, as throughput issues, data access, and hardware and administrative costs begin to challenge users and their IT managers.

Accessing the Archived Data Warehouse Across Storage Tiers

The data at the heart of these myriad business uses includes not only structured transaction data from ERP and back office systems, but also unstructured data from a host of sources that were largely nonexistent even a decade ago. Email and Internet transaction logs, voicemail databases, contracts, medical records, point-of-sale systems data and other data sources have been added to the ocean of data that companies must now swim through in the course of their day-to-day operations.

These transaction systems leave a data trail that is piling up at an astonishing rate: it's not uncommon for active transaction systems to contain many terabytes of data. The data warehouses that are fed by these voracious transaction systems are becoming larger than anyone had ever imagined.

Deriving Business Value from the Data Archive

Data archives have been a traditional solution for addressing usability and cost, particularly when it comes to off-loading historical or infrequently used data. Indeed, the archive's main contribution has been to improve the usability of the remaining online data. As such, archiving has traditionally been a less-than-perfect solution to the problems of too much data and not enough throughput because most archiving solutions rely on tape-based systems that are both costly and not user-friendly. The result is that while archiving solves the problem of throughput and cost for the on-line portion of the data, it fails to provide a solution for archived data that is cost-effective and supports relatively rapid data access.

Thus, from a business standpoint, archiving is a problematic solution for most users. Archives cannot support timely data analysis, despite the fact that for many business uses - particularly those relating to regulations, compliance and legal action - timeliness is a major criterion for action. The current state of the art in archiving is thus too cumbersome and costly to keep pace with the growth of transaction databases and data warehouses and the analytical needs incumbent upon them. For most companies and most use cases, archiving represents an imperfect solution.

A New Approach to Tiered Data Archiving

Clearly, a new archiving solution is required to enable databases to operate at maximum efficiency. One potential solution provides four key features:

  • Data compression,
  • Online query access,
  • Maintenance exposure, and
  • Enterprise scalability.

Use of column-based data compression technology allows for storage of relational data in what is essentially a pre-indexed format, alleviating the requirement for storing or building indexes at restore time. This design significantly reduces the overall storage needed for the database. Column-based storage also significantly improves data compression: being made up of a single data type, each column of data can be compressed much more efficiently than rows of data, which by definition include many different data types. This technology can also further reducing the data footprint by selecting the best optimized compression strategy for each data type.

Column-based storage also allows more rapid processing of archival queries: reporting tools can either directly query the repository using the subset of the ANSI SQL language current supported, or the necessary data can be rapidly restored to an operational data store and queried using the full complement of SQL commands. This accessibility contrasts with the majority of archiving systems that limit access to summary data unless a full database restoration process has been undertaken.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.