Four Features Needed for Extreme Archiving

By
  • Jeroen van Rotterdam
Published
  • May 31 2016, 6:30am EDT

(Part two of a two-part series)

In my last article, I shared four reasons why organizations, particularly in regulated industries, need to consider extreme archiving. I provided examples of four specific challenges: regulations, scale, costs and untapped data value and today, I want to share what is needed to address these extreme archiving requirements.

Horizontal Scale

Why is a scalable solution so important?

First, the volume of structured and unstructured data is growing exponentially, and it’s not only because we are all digital hoarders. Regulations force us to collect and maintain larger volumes of data. Large financial institutions, for instance, need to manage up to a 100 billion emails; tens of billions of transactions; all social media and communications content created by their employees and customers; and of course documents – all in a cost-effective way.

Secondly, an enterprise archive solution for both structured and unstructured data should not only eliminate costly archive silos, it should also allow for the discovery and analytics of data across all applications. If done correctly, solutions should scale at the point of ingestion, the point of management and the point of access to the data, enabling new use cases, such as providing a 360-degree view of customer data and insights.

Third, scalability ensures that an archive can meet the organization’s needs as they evolve over time. Initially the archive may be needed to decommission legacy applications or to implement compliance regulations. The ingestion throughput could be either high or temporarily high, but the search and retrieval throughput could be low. However, when organizations start using the archive to expose data to their customers, the access pattern changes to high-ingestion throughput and high-frequency retrieval patterns because of the service levels needed for real-time retrieval. An archive solution needs to scale horizontally to handle this change in access patterns.

Finally, as data ages, the access pattern of historical data typically becomes less frequent. In this case, the archive needs elasticity where inactive data partitions are managed in a cost-effective way.

Understanding Data

Archives that handle both structured and unstructured data need to understand the data in order to make smart decisions based on the information. What do we mean by “understanding data”? Good analytics tools, for instance, can automatically classify unstructured content in real-time at the ingestion point, decide which metadata needs to be collected for the record, or can implement smarter partition strategies. Analyzing the data, in real-time, at the ingestion point eliminates the need for costly post-processing, such as record enrichment or indexing of the data. Updating archive records after the fact is problematic since you are not storing a consistent snapshot in time of your formal record and, when storing data on WORM devices, metadata enrichment can lead to record fragmentation in your archive.

Configurable Discovery Interface

The ability to search and discover data for eDiscovery and compliance requirements continues to be a strong driver for a single, scalable archive. Customers are now storing hundreds of different types of information from (sometimes) thousands of different source applications. Surprisingly, it’s not only the search and discovery of the data that needs to be simplified. Configuring a discovery interface at this scale is typically time-consuming. Look for solutions that provide intuitive productivity tools.

In-Place Compliance Capabilities

The importance of in-place compliance capabilities cannot be underscored enough. Handling compliant data at scale is complex and requires the ability to manage retention and encryption policies as well as legal holds. Global organizations are faced with different retention requirements in different countries, and it is not uncommon for large-scale archives to manage between 5,000 and 10,000 retention policies.

Archives need to be designed to handle changing regulations. Too often we see solutions that need to re-ingest data or are based on exports for legal holds and retention management leading to un-manageable and costly archiving environments.

The Bottom Line

Extreme requirements are driving the need for extreme archiving. Today’s needs, especially in regulated industries, cannot be addressed with siloed archiving solutions, and the features outlined in this article point to some of the key differences. Ultimately, the right solution will pay for itself, with an extremely short time-to-value and a significant impact on the bottom line, enabling organizations to reinvest IT dollars into innovation and digital transformation.

(About the author: Jeroen van Rotterdam is chief technology officer, vice president and distinguished engineer in the Enterprise Content Division at EMC)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access