The New Content Management Paradigm: Leaving Space at the Table for Both Structured and Unstructured Content
Information Management Magazine, July 2007
In a world where organizations grapple with exponential growth in unstructured content, there is an increasing interest in structured content and its management. This article examines the differences between structured and unstructured content and makes the case for separate but cooperative content management systems. It also identifies a sequence for adopting a structured content environment that should counter any irrational exuberance for the nascent paradigm shift.
Structured Content Examined
Structured content, as a phrase, most often means text and graphics wrapped in eXtensible Markup Language (XML). Strictly speaking, it doesn't have to be XML - any markup language could be the wrapper. But today, XML is the lingua franca of structured content interchange between various systems, so the common definition is sufficient. Within the family of structured content are two subspecies: 1) XML content conforming to a structural specification and 2) any other XML content. The distinction implies that any XML content conforming to rules found in a document type definition (DTD) or an XML schema is truly structured content. Anything else is more or less text and objects (perhaps pages and pages of them) surrounded by one or more XML tags.
The Darwin Information Typing Architecture (DITA) and DocBook are two examples of standards-based specifications for creating rule-driven structured content. DITA's foundation is topic-style authoring, akin to an application's help file where you select a subject or keyword and look at a step-by-step list of instructions. DocBook's foundation is book based. It has rules for chapters, sections and paragraphs. When writing structured content that conforms to DITA or DocBook, for example, there are restrictions as to what kinds of "elements" can be composed or incorporated at certain points within the text. The rules indicate the relationship between various elements and give a very good idea about how elements can be reused in other places. In this way, DITA and DocBook - and other structured content standards - promote content reuse by allowing pages of text to be broken into self-contained sentences, paragraphs or chapters that can be used in any number of finished documents.
Advertisement
Unstructured Content Examined
The reuse concept isn't practical with unstructured content, however. Unstructured content usually refers to whatever anyone writes, draws or otherwise composes using their own ideas about the rules for comprehensible writing, artistry or composition. This implies that email, text messages, instant messages, Web pages, phone conversations, graphics, word-processing documents and virtually anything that can be composed or assembled according to one's own "rules" is unstructured content. A Microsoft Word 2003 document of text and graphics is a good example of unstructured content. Aside from an author's own opinions and the limitations of the Word program itself, there are no other rules per se about what can be written in Word. To reuse unstructured content, the targeted information must be plucked from within the text and recomposed to fit the context of the destination document. Reusing content from many unstructured documents is very time-consuming.
Choosing the Right Content Management System Solution
As more content-authoring teams consider enterprise content management (ECM) solutions to manage a widening expanse of content, the question of which type of content is produced - structured or unstructured - becomes more central to the decision for selecting a content management system (CMS).
Consider the production of structured content. Structured content authoring forms the basis for high levels of content reuse. It takes less time to locate and reuse high quality content than it does to create new, high quality content. As time affects both cost and schedule, high levels of high quality content reuse drive higher levels of overall content output at overall lower cost. In other words, structured authoring actually drives down the overall cost to produce high quality documentation but only when the organization employs high levels of content reuse. Naturally, in this case, you would want to select a CMS that promotes high levels of reuse through both its user interface and its underlying architecture. This most often means that to manage structured, XML-based content, you need a native-XML CMS.
Of course, someone will ask, "How granular does my object model need to be?" There must be some operations that favor reuse of entire documents and other operations that favor reuse of paragraphs and sentences. Answering this question indicates when a native-XML CMS solution returns the highest value.
For example, in older knowledge management paradigms that focus on publishing, posting or otherwise delivering static content, the granularity requirements are low. A whole document or set of pages from a document is sufficient to support the knowledge management system. Therefore, a document-based CMS is a good fit for these types of solutions. Conversely, progressive organizations managing highly dynamic content - for instance, a company that allows its customer support team to submit customer feedback directly into the product documentation - need a highly granular content model to track content changes at the paragraph or sentence level. Marketing communications departments, for example, would need sentence-level granularity because they constantly snip sentences from different sources to build data sheets, product brochures and similar nonnarrative collateral. They would need to know when those sentences change so that they could update their collateral accordingly. Companies that operate with this kind of efficiency really need an XML-based CMS solution because the architecture enables consistent performance at any level of content granularity. Overall, a decision to adopt structured authoring of dynamic product content augurs a paragraph- and sentence-level native-XML CMS.
Authors or systems producing unstructured content - or content wrapped in XML without any governing DTD or schema - are aligned best with a document-based CMS. These CMS solutions offer basic content services: check-in, check-out, file-level versioning, highly effective access control, workflow routing and so on. Also, a document-based CMS is an ideal platform for plugging in a records management option, adding a form-based routing capability (such as moving an insurance claim form through an approval process) or connecting to a traditional data warehousing system. Vendors of these kinds of ECM systems have delivered enormous value to their customers by exploiting unstructured document techniques and technologies.
Page 1 of 2.







