In other words, the problem of information overload is a business problem, not just a technical problem. This is something that many organizations overlook, mainly because it's up to CIOs and CTOs to find a solution. But the effect of poorly managed data has a direct impact on productivity and the bottom line, and smart companies (as well as forward-thinking government agencies and nonprofit entities) know that they need to be ahead of the curve if they want to benefit from the information avalanche rather than be buried under an infinite pile of ones and zeroes.
The Problem We Face
It's no secret that today's organizations in every sector – from government agencies to manufacturers to law firms to insurance companies – are faced with the challenge of managing more data than even a decade ago. Back in the 1970s, we used to talk about megabytes of information – then it became gigabytes and terabytes, and now we're dealing with exabytes. Information storage has gotten so inexpensive (Moore's Law strikes again!) that the per-byte cost to save and access a particular piece of data is pretty close to free – it's like dealing with pennies or subatomic particles because there's always a smaller and smaller unit in play.
Having information is good, but as more and more data becomes available, we have no idea how to harvest it. And sometimes more information actually just confuses us. Companies that used to plan by analyzing one year of information now have the ability to scrutinize 50 years of data. It may sound like a good thing until you start having to deal with the crush of millions of information fields. We're no longer managing information - we have become collectors. The issue is that we need to synthesize the information into something useful to improve our business.
Structured and Unstructured Data
To complicate matters, it's not just the amount of data that's increased over the years – it's the kind of information that systems are expected to store. Having a spreadsheet filled with tables of numbers is one thing, but having photographs, scanned documents, audio and video and other information in play creates an entirely new kind of animal. In order to deal with this, data experts divided the world into structured and unstructured data. Much has been written on these two categories of data, but from my perspective, the standard delineation isn't quite right because it doesn't adequately deal with the shades of gray that I believe are important to understand.
I define structured data as information that fits nicely in a relational database. This is pretty self-explanatory: financial data, sales tables and profit-and-loss information fit neatly into databases and are easy to access and manage. Everything that doesn't fit into tabular form is usually lumped into the unstructured category, but in my experience, very little data is truly unstructured. The problem is that most databases don't know how to deal with it.
What we're really dealing with is what I think of as formal and informal worlds of information drawn from many different perspectives and viewpoints. Unstructured data (which, as I said before, makes up a small portion of the information that organizations deal with) truly has no shape or form that is externally discernable. Most data is what I call semistructured, meaning that it actually has attributes that can be managed.
A perfect example of this is email, which is often categorized as unstructured because it's not as neat and tidy as some might like. In fact, emails have plenty of attributes that advanced data management systems can deal with just as easily as they handle charts of numbers. For starters, a good data management system can parse emails based on their language, what system they come from, when they were sent, who sent them and who received them. By the way, this also works for memos, faxes and other documents.
So what about pictures and videos, which are often dropped into the unstructured data bucket? Just because there is no single bit in a relational database that includes a photo (which are stored as binary large objects, or BLOBs, but are really black boxes within the database), and there is no field called "mpeg," doesn't mean that these items can't be categorized and stored for analysis. In my mind, when data is classified as unstructured, what it tells me is that someone said, "We have no understanding of what this is."
How We've Tried To Fix The Problem
Traditional relational databases are wonderful things, but they only go so far, and there have been several alternative approaches that have found success in the marketplace. The original content management systems that started popping up in the late 1980s focused on eliminating paper. They allowed organizations to scan documents and images and store them away for recall, which reduced the need for file cabinets. At the time, this was a leap forward because for the first time organizations were able to use technology to manage what had been a physical process.
One of the first industries to embrace this approach was insurance, which is a document-intensive industry. Insurance companies used the information for call center and customer self-service, claims handling and payer operations. Insurer USAA started with a focus on customer actions driven by paper mail – in fact, the daily mail was scanned and queued for action.










Be the first to comment on this post using the section below.