Unstructured is - well, information that is not structured. While is a somewhat glib definition, it is entirely accurate. But, for those that prefer a more academic version - here is the one we came up with for our practice. Unstructured information (UI) is any document, file, image, report, form, etc. that has no defined, standard structure that would enable convenient storage in unit record or similar automated processing devices; it cannot be defined in terms of rows and columns or records; and the data cannot be examined with standard unit record access.
UI is important because it comprises from 75 to 90 percent of an organization's useful information. Few, if any decisions or actions are enacted solely on the presentation of structured data. A plant operator always combines the structured facts with intuition, experience and the laminated "cheat sheet" tacked to the wall. The manager applies experience, context and market politics as well as the metrics presented on a report.
The profound challenge of UI is it is so darn easy to use, but very hard to manage. Business people blend various types of content minute by minute. Of course, there has never been a distinct line as far as a businessperson is concerned, only the technologist has issues. As stated in a previous article, the business users will report usage of UI at 100 percent. The IT folks will report in at around 15 - 35 percent for utilizing UI. Let's face it - the business folks are right. So how do you recognize opportunities for UI and adjust your DW/BI environment and the meta data/ information management strategies?
Understanding the usage of UI requires understanding the span of UI. It isn't just document management or a portal to access content or content management. It is a multidimensional business component. Figure 1 illustrates that not only is UI intermingled with structured data, it also goes from spanning individual minds to corporate-wide communities. The more abstract the information, the more complex the UI solution. Most organizations are in the blue boxes with a desire to move out from the center to each corner. Many organizations welcome an assessment to inventory content and cultures, determining where they fall on this chart.
Figure 1: The Context Span of Unstructured Information
The yellow, or lighter bubbles are UI. Note, as UI becomes more personal, i.e., personal notes or experience, context spans from individual to community. Even if you could somehow suck the information out of a brain, there will be community issues of access, usage, relevance and security.
There are many examples of successful blending of unstructured information into business processes. Value is best understood by looking at some scenarios. These are real-world scenarios that have been generalized to preserve competitive advantage.
A chemical company has been mystified for years by the fact that certain markets will suffer unexplained sales declines. A portal project enables blending of some data warehouse reports with typical portal content (news, weather, internal information). An observant analyst thought she noticed something and had her portal adjusted to display any news release related to the company or industry or downstream partners along side sales figures. It took a few weeks to find out that sales were directly proportional to negative news releases affecting the downstream partners. The company was able to determine that news events could be tracked, and the production pipeline slowed down to accommodate the demand drop. The company began a proactive program of informing downstream partners of the news events that would affect the pipeline.
An insurance company uses contract operators to process complex claims. The operators utilize a Web-based claims application, but they also can enter a portal that allow instant messaging with medical review people, shows x-rays and pictures to claim adjusters and picks up e-mail from the related parties. (This is already good, but it gets better.) One operator consistently process claims quicker and more accurately than his peers. The company, via click analysis, discovers that the operator actually spends more time reviewing the scanned paperwork than the structured claims records. Errors are discovered more quickly. The operator also keeps a cheat sheet of which adjusters are more accurate. The insurance company revises claims workflow to present the scanned paper and assigns operators to specific groups of adjusters to speed up processing.
Most everyone familiar with the DW "bow tie." (See Figure 2.) If UI is to be part of the picture, the bow tie requires modification. The various types of UI will usually be stored in a separate physical instance. Adoption of UI facilities into the structured RDBMS has been slow. IBM, Oracle and Teradata all support BLOBs etc., but usage is infrequent. Most likely a content management or document management solution will be the entry point. Therefore, while the meta data layer will need to manage the UI, the physical handling may be separate as shown in Figure 2.
The movement of UI takes on a few new characteristics as well. Most of us are familiar with acquire, extract, transform steps for structured data. UI requires the following:
- UI acquisition - in addition to "reading," there may be format shifts.
- Parsing/segmentations - Most of the time, all of the content is not needed. Technologies are applied to parse out or subset the meaningful portions.
- Ascribing - UI is meaningless unless the context and content are registered and assigned. The ascribing step places entries in the meta data facilities to track the content.
- Load or storage - Place the UI , in context, into the DBMS or content manager for publication and access.
- Publish/Archive - Push (or pull) the content into the users' hands.
Figure 2: Traditional DW "Bow Tie" and the UI Layer
Meta Data and Information Management
As previously mentioned, there are some extra meta data steps, some fine points to managing UI. Most are related to maintaining context and tracking content.
Various type of UI include images (still and moving), voice and sound, documents, (word) Web content, electronic documents (scanned) and electronic correspondence (e-mail). Meta data for these data types must, of course fit into the roles of meta data for structured information, i.e., there are definitional and semantic functions, navigation and usage, and administrative aspects.
UI meta data will have to track context to support navigation and provide uniform semantics. This means definition of what a piece of UI represents, and the role and business processes it is related to.
End users will need to know what is contained in various UI instances; therefore, an index and glossary must exist in the meta layer.
UI access will be managed through meta data by not only the typical integrity and security features, but also in remembering who can publish, repurpose or syndicate UI.
The meta data tools must enable maintenance of the UI indices, context and definitions. This is in addition to meta data common to UI and structured data - location, access, business rules, etc. Finally, the meta data will have to correlate and connect structured data and UI, so it is not mismatched or the context is corrupted.
Managing UI becomes a valuable approach if an organization is embarking on refitting a DW or has business issues that require consideration of documents to resolve. However, there are new requirements and consideration for meta data and data management.
Lack of wholesale adoption of RDBMS technology will cause many organization to "make do" with homegrown meta data for UI or existing content management tools, but UI represents a valid data source for extending the value of a DW.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access