Continue in 2 seconds

Structured and Unstructured: The What, Why and How of Convergence

  • May 20 2008, 6:06pm EDT

Since the introduction of database management systems (DBMSs) in the 1960s, much of the strength and area of focus for the DBMS has been on storing structured data. Originally, structured data was stored in record structures in early nonrelational databases or in traditional tables found in relational databases. Unstructured data rarely even made it into the computer system and was relegated to file cabinets.

Early Hints

Two initial hints of how structured and unstructured data could come together within relational databases were 1) the usage of text fields and searching, and 2) storage of binary large objects (BLOBS) within the relational databases.

Text fields were almost always part of database design. Whether it was comments, memos or description fields, even the earliest database designs had text, usually in the form of a text field of a certain length. The early versions of relational databases could handle some text, but they were a much better fit for handling numeric data and its calculations. So, starting in the 1980s, relational databases had more specific functions for handling text, including indexing and searching text. BLOBS were never really part of the early database designs. The concept of storing large images, photographs or other nondatabase objects was foreign.

The early impact of these two options was that an application could be built from one DBMS that would deliver the numbers, text, supporting images and binary objects. Concurrent to this was the increased use of document management systems within an organization to store and index documents. The current state in most IT departments is that the structured database applications such as business intelligence (BI) are supported by one group, document management is supported by another, and they rarely collaborate on IT solutions. However, interestingly enough, behind the scenes there actually may be a combination of some of the capabilities in the DBMS as well as extended functions that are specific to document management and control.

The What, Why, How Cycle

A key convergence area of the structured and unstructured space is in decision support. A natural path exists for using information systems to support decision-making in an organization. This path of “what” then “why” then “how” is best satisfied by both structured and unstructured data.

In the what stage, a decision-maker is looking for the indicators from an information system that will aid in understanding the current state. At this stage, a decision-maker needs to interrogate a series of indicators or metrics to determine the overall health of the organization or function being monitored. This is normally done using some form of a reporting or BI system that draws from a relational database. In answering the what questions, a decision-maker is normally dealing with lagging indicators or results of other indicators.

In the what stage, a decision-maker will home in on the problem area to understand, but he or she will then need to answer the next obvious question: why?

In the why stage, a decision-maker has determined there is a problem, such as rising costs or dropping revenue. The decision-maker will want to determine why this happened. He or she will look for the leading indicator to provide further clues to the anomaly or change. In doing so, the decision-maker will use a BI or analytic tool to drill down and across the data to get the best indicator of why there is a problem. In doing so, he or she is still dealing with structured data from the relational database.

Often, though, getting the complete answer will also require reaching out to unstructured data sources such as content and document management (see the sidebar for an example). This combination of the best leading indicators and the supporting unstructured data and content gives the decision-maker the confidence that he or she has gotten to the root cause.

Thus, the why stage is when the structured and unstructured become intermingled in the decision-making process. The challenge to IT is to be able to provide a seamless experience to move easily from structured to unstructured.

Armed with what is going wrong and why it is going wrong, the decision-maker now needs to determine how to fix the problem. He or she now crosses back into the structured data area and utilizes a planning and modeling environment. Using the insight gained from the why stage, what-if models can be built that drive off of the key lagging indicators and any other insight gained from the unstructured data.

In planning systems, a decision-maker can model the lineage from the leading indicators to the lagging indicators and can run various models showing that if an upstream change in the leading indicators happens, the resulting lagging financial metrics will change.

The how stage mostly resides in the structured data area, but again, a decision-maker may need to refer to the unstructured data and content to hone the model in the planning system.

The Unifying Business Process

The cycle from what to why to how is part of a normal business decision-making process. IT professionals must let this business process guide them into what is the best way to meld the structured and unstructured data domains. This is often tough for the IT professional who has taken one side or another in the structured versus unstructured domains. More often than not, an IT organization will have totally separate groups responsible for areas such as document management, content management and BI. The lack of communication between IT and business on the nature of the true business problem can make the problem even more challenging. The best practice here is to let the business problem dictate how the combination of structured and unstructured data will be solved through a formal gathering of requirements and use cases.

Ultimately, the convergence across the structured and unstructured domains will continue. As organizations make larger investments in decision support systems and vendors make larger R&D investments into this converged space, businesses should see higher return on investment. Gradually, business users will become shielded from the data gymnastics that their IT departments need to do to manage this converged zone. This will lead to higher system adoption rates and an increased sense of ownership from the business.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access