MAR 11, 2011 12:54pm ET

Related Links

EMC Kicks Up Content Management with Update, Acquisition
May 22, 2012
Dispatches from MIT CIO Symposium
May 22, 2012
Insurance CIOs Balancing Legacy Reliance, Consumer Expectations
May 22, 2012

Web Seminars

Creating a Sense of Application Awareness in IT Virtualization Environments
Available On Demand
Transforming Processes to Achieve Greater Agility and Efficiency
Available On Demand

Make Sense of Enterprise Data and Cure the Information Disconnect

Print
Reprints
Email

Enterprises have massive amounts of structured and unstructured data, and to be truly useful, this information has to deliver a 360-degree view of the business so that decision-makers can understand its implications across all of the business disciplines. Unfortunately, it is almost impossible to make sense of all the different ongoing discussions captured in enterprise data today - and the amount and types of data that enterprises will be expected to manage in the next few years will be orders of magnitude higher than what exists today. Coordinating this exponentially increasing amount of data, eliminating data silos through standardization and consolidation, and plugging all the data together across multiple disciplines is a secret weapon for organizations that want to maintain their competitive advantage.

In other words, the problem of information overload is a business problem, not just a technical problem. This is something that many organizations overlook, mainly because it's up to CIOs and CTOs to find a solution. But the effect of poorly managed data has a direct impact on productivity and the bottom line, and smart companies (as well as forward-thinking government agencies and nonprofit entities) know that they need to be ahead of the curve if they want to benefit from the information avalanche rather than be buried under an infinite pile of ones and zeroes.

The Problem We Face

It's no secret that today's organizations in every sector – from government agencies to manufacturers to law firms to insurance companies – are faced with the challenge of managing more data than even a decade ago. Back in the 1970s, we used to talk about megabytes of information – then it became gigabytes and terabytes, and now we're dealing with exabytes. Information storage has gotten so inexpensive (Moore's Law strikes again!) that the per-byte cost to save and access a particular piece of data is pretty close to free – it's like dealing with pennies or subatomic particles because there's always a smaller and smaller unit in play.

Having information is good, but as more and more data becomes available, we have no idea how to harvest it. And sometimes more information actually just confuses us. Companies that used to plan by analyzing one year of information now have the ability to scrutinize 50 years of data. It may sound like a good thing until you start having to deal with the crush of millions of information fields. We're no longer managing information - we have become collectors. The issue is that we need to synthesize the information into something useful to improve our business.

Structured and Unstructured Data

To complicate matters, it's not just the amount of data that's increased over the years – it's the kind of information that systems are expected to store. Having a spreadsheet filled with tables of numbers is one thing, but having photographs, scanned documents, audio and video and other information in play creates an entirely new kind of animal. In order to deal with this, data experts divided the world into structured and unstructured data. Much has been written on these two categories of data, but from my perspective, the standard delineation isn't quite right because it doesn't adequately deal with the shades of gray that I believe are important to understand.

I define structured data as information that fits nicely in a relational database. This is pretty self-explanatory: financial data, sales tables and profit-and-loss information fit neatly into databases and are easy to access and manage. Everything that doesn't fit into tabular form is usually lumped into the unstructured category, but in my experience, very little data is truly unstructured. The problem is that most databases don't know how to deal with it.

What we're really dealing with is what I think of as formal and informal worlds of information drawn from many different perspectives and viewpoints. Unstructured data (which, as I said before, makes up a small portion of the information that organizations deal with) truly has no shape or form that is externally discernable. Most data is what I call semistructured, meaning that it actually has attributes that can be managed.

A perfect example of this is email, which is often categorized as unstructured because it's not as neat and tidy as some might like. In fact, emails have plenty of attributes that advanced data management systems can deal with just as easily as they handle charts of numbers. For starters, a good data management system can parse emails based on their language, what system they come from, when they were sent, who sent them and who received them. By the way, this also works for memos, faxes and other documents.

So what about pictures and videos, which are often dropped into the unstructured data bucket? Just because there is no single bit in a relational database that includes a photo (which are stored as binary large objects, or BLOBs, but are really black boxes within the database), and there is no field called "mpeg," doesn't mean that these items can't be categorized and stored for analysis. In my mind, when data is classified as unstructured, what it tells me is that someone said, "We have no understanding of what this is."

How We've Tried To Fix The Problem

Traditional relational databases are wonderful things, but they only go so far, and there have been several alternative approaches that have found success in the marketplace. The original content management systems that started popping up in the late 1980s focused on eliminating paper. They allowed organizations to scan documents and images and store them away for recall, which reduced the need for file cabinets. At the time, this was a leap forward because for the first time organizations were able to use technology to manage what had been a physical process.

One of the first industries to embrace this approach was insurance, which is a document-intensive industry. Insurance companies used the information for call center and customer self-service, claims handling and payer operations. Insurer USAA started with a focus on customer actions driven by paper mail – in fact, the daily mail was scanned and queued for action.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.