Free Site Registration

Busting the ECM Myth

InfoManagement Direct, October 8, 2009

Dmitri Tcherevik

The myth of an “über repository” is finally busted. For a long time, we hoped that there could be one repository that could hold all of our unstructured data, also known as content. Not anymore. 

Realization of this fact comes as our view of what content is and what can be done with it is gradually transformed. For a long time, content was defined by what one cannot do with it. We knew that content could not be stored in a relational database. So content had to be stored elsewhere.  But where? 

The answer appeared simple. If you worked in an enterprise environment and you had content, you stored it in an enterprise content management system. This is where you could consolidate content, and once content was consolidated, it could be controlled. Controlling enterprise content is very important, chiefly for compliance reasons. You do not want your content to be lying around willy-nilly; this can get you into all sorts of trouble.  

This consolidated ECM vision promoted the über repository notion. In those early days of ECM, I was often asked if all my content was stored in one repository. If the answer was “no,” I was given a look clearly conveyed I was in deep trouble and needed urgent help, as did many other people. This help we received, and there is now an entire separate ECM industry dedicated to managing stuff that cannot be stored in the database. 

Gradually, we are learning now that über repositories do not really work all that well. Yes, content must be controlled, and we definitely need ECM. Control, however, does not necessarily imply consolidation. There are several reasons why consolidating content in to one repository may be foolish, impractical, or even impossible. 

First, there is just too much of it. The amount of content in the world grows exponentially. In the next two years, we will generate as much content as all of humanity managed to create during the entire history of humankind.  

Second, we now know that there are many different types of content. There are documents that you share internally, content that you put on your Web site, high-resolution audio and video content that you stream to TVs or play on iPods and user-generated content in the form of short videos, photos, “tweets,” and comments. 

Third, we now know that the way people and applications use content can also be dramatically different. In some cases, content is generated at a torrid rate, stored and almost never read. In other cases, content is created once, stored and then sent via millions of concurrent streams to people all over the world. There are an infinite number of variations between these two extremes. 

For every different combination of content volume, type and usage pattern, one needs a different content repository. HDFS, for instance, is great at efficiently storing and retrieving very large files. Documentum and SharePoint are good at managing documents in a collaborative environment. Cassandra is good for managing user-generated content where a sustainable rate of updates is much more important than data consistency. When dynamic delivery of content to a Web browser or mobile devices is desired, then a content management system such as FatWire or Drupal, can be used.

The past few years show that specialized repositories have prospered, and we have more of them appearing every day, while über repositories that try to be all things to all people have not done well and have withered. 

We accept the diverse multirepository world as a fact. This still leaves us with the problem of managing and controlling enterprise content that is distributed across disparate repositories. Several approaches to solving this problem exist.

One school of thought suggests that even if you cannot consolidate content, you still must be able to consolidate access to this content. First, you deploy a single content integration hub. Then you use adaptors to connect the various content repositories to this hub. Once the repositories are connected, one must be able to access and control content stored in all of them via a single interface exposed by the hub. This is the approach promoted by standards such as JSR-170 and JSR-286. 

In practice, access consolidation turns out to be as difficult to implement as content consolidation. The integration hub becomes the single bottleneck that does not support all access patterns equally well. The broadly accepted conclusion now is that distributed content requires distributed access. 

Two architectural patterns have become particularly popular for building distributed applications: service-oriented architecture and representational state transfer. 

SOA says that the world is populated by services. Each service implements a well-defined interface, also known as the service contract. Applications can discover services. They can also invoke services by sending messages to their so-called endpoints. One can replace service A that has certain characteristics with service B that has a totally different set of characteristics. All that matters is that A and B implement the same service contract. In this sense, applications and services are said to be loosely coupled. 

From the REST point of view, the world consists of resources. Each resource has a unique resource identifier (URI). Resources can be subject to a few well-defined operations, such as GET, PUT, UPDATE, DELETE, and possibly others. These operations are interpreted by servers. Applications refer to resources via URIs and may not be aware of their physical location. HTTP is often the protocol used to submit requests to servers and receive responses.

Advertisement

Page 1 of 2.

Advertisement

Advertisement