Make Sense of Enterprise Data and Cure the Information Disconnect

  • March 11 2011, 12:54pm EST

Enterprises have massive amounts of structured and unstructured data, and to be truly useful, this information has to deliver a 360-degree view of the business so that decision-makers can understand its implications across all of the business disciplines. Unfortunately, it is almost impossible to make sense of all the different ongoing discussions captured in enterprise data today - and the amount and types of data that enterprises will be expected to manage in the next few years will be orders of magnitude higher than what exists today. Coordinating this exponentially increasing amount of data, eliminating data silos through standardization and consolidation, and plugging all the data together across multiple disciplines is a secret weapon for organizations that want to maintain their competitive advantage.

In other words, the problem of information overload is a business problem, not just a technical problem. This is something that many organizations overlook, mainly because it's up to CIOs and CTOs to find a solution. But the effect of poorly managed data has a direct impact on productivity and the bottom line, and smart companies (as well as forward-thinking government agencies and nonprofit entities) know that they need to be ahead of the curve if they want to benefit from the information avalanche rather than be buried under an infinite pile of ones and zeroes.

The Problem We Face

It's no secret that today's organizations in every sector – from government agencies to manufacturers to law firms to insurance companies – are faced with the challenge of managing more data than even a decade ago. Back in the 1970s, we used to talk about megabytes of information – then it became gigabytes and terabytes, and now we're dealing with exabytes. Information storage has gotten so inexpensive (Moore's Law strikes again!) that the per-byte cost to save and access a particular piece of data is pretty close to free – it's like dealing with pennies or subatomic particles because there's always a smaller and smaller unit in play.

Having information is good, but as more and more data becomes available, we have no idea how to harvest it. And sometimes more information actually just confuses us. Companies that used to plan by analyzing one year of information now have the ability to scrutinize 50 years of data. It may sound like a good thing until you start having to deal with the crush of millions of information fields. We're no longer managing information - we have become collectors. The issue is that we need to synthesize the information into something useful to improve our business.

Structured and Unstructured Data

To complicate matters, it's not just the amount of data that's increased over the years – it's the kind of information that systems are expected to store. Having a spreadsheet filled with tables of numbers is one thing, but having photographs, scanned documents, audio and video and other information in play creates an entirely new kind of animal. In order to deal with this, data experts divided the world into structured and unstructured data. Much has been written on these two categories of data, but from my perspective, the standard delineation isn't quite right because it doesn't adequately deal with the shades of gray that I believe are important to understand.

I define structured data as information that fits nicely in a relational database. This is pretty self-explanatory: financial data, sales tables and profit-and-loss information fit neatly into databases and are easy to access and manage. Everything that doesn't fit into tabular form is usually lumped into the unstructured category, but in my experience, very little data is truly unstructured. The problem is that most databases don't know how to deal with it.

What we're really dealing with is what I think of as formal and informal worlds of information drawn from many different perspectives and viewpoints. Unstructured data (which, as I said before, makes up a small portion of the information that organizations deal with) truly has no shape or form that is externally discernable. Most data is what I call semistructured, meaning that it actually has attributes that can be managed.

A perfect example of this is email, which is often categorized as unstructured because it's not as neat and tidy as some might like. In fact, emails have plenty of attributes that advanced data management systems can deal with just as easily as they handle charts of numbers. For starters, a good data management system can parse emails based on their language, what system they come from, when they were sent, who sent them and who received them. By the way, this also works for memos, faxes and other documents.

So what about pictures and videos, which are often dropped into the unstructured data bucket? Just because there is no single bit in a relational database that includes a photo (which are stored as binary large objects, or BLOBs, but are really black boxes within the database), and there is no field called "mpeg," doesn't mean that these items can't be categorized and stored for analysis. In my mind, when data is classified as unstructured, what it tells me is that someone said, "We have no understanding of what this is."

How We've Tried To Fix The Problem

Traditional relational databases are wonderful things, but they only go so far, and there have been several alternative approaches that have found success in the marketplace. The original content management systems that started popping up in the late 1980s focused on eliminating paper. They allowed organizations to scan documents and images and store them away for recall, which reduced the need for file cabinets. At the time, this was a leap forward because for the first time organizations were able to use technology to manage what had been a physical process.

One of the first industries to embrace this approach was insurance, which is a document-intensive industry. Insurance companies used the information for call center and customer self-service, claims handling and payer operations. Insurer USAA started with a focus on customer actions driven by paper mail – in fact, the daily mail was scanned and queued for action.

Insurers gradually converted nearly all of their documents and moved to the second phase – business process management – where the business process is automated and driven by the document system. In the case of USAA, the claim request kicks the process off and drives the information to the investigator, then to the claims adjuster and finally through the payment system. The resultant cost savings are enormous: USAA, Blue Cross, the U.S. Social Security agency, State Farm and Alliance all benefited from this approach, and in the last 10 years, BPM has become the de facto standard in industries that have repeatable, document-heavy processes.

The Future

The rate of information growth is not going to slow down any time soon. Faster computer systems, cheaper storage and better analytical tools are making it easier than ever for organizations to collect virtually unlimited quantities of data. The Holy Grail is being able to use the information effectively. In the insurance vertical, the benefits of these alternative approaches have been breathtaking, and they offer a roadmap for helping organizations in all industries manage the ever-increasing flow of data. From a technical standpoint, insurers are able to deal with all data equally well, including so-called structured and unstructured information. Photographs may not fit natively into traditional relational database systems, but CMS/BPM tools allow insurers to store accident pictures, memos and emails just as easily as they archive claim information.

Given all the information that insurance companies now have at their fingertips, analysis for fraud and increased operational efficiency have been huge areas of improvement because information crosses business function areas to create a consolidated perspective. Many insurance solutions have also added records management and use compliant storage in response to regulatory issues, and it is reasonable to expect them to continue expanding the kinds of data that can be categorized and acted upon.

More recently, enterprise content management systems have started housing and managing collaborative content,  such as emails, blogs and social networking posts, as employees communicate in-house and to outside audiences via these channels. The motivation for these capabilities is a recognition that all collaborative content needs to be sustained, immutable and available for legal discovery. These systems integrate text search/patterns with transactional analysis to detect fraud, improve efficiency of operation and improve service for customers.

So what is the key to information management in the future? Certainly increasing the kinds of data that can be stored and analyzed within a data system is important, but the major change needs to be one of attitude, not technology. This may sound odd coming from a database lifer, but I believe that it is true. Twenty years ago, different departments within an organization had fundamentally different views of life: finance folks looked at P&L numbers, human resources saw the world in terms of headcount, and engineers focused on building their products. Today's organizations have much better communication between groups and are able to assimilate information from lots of different perspectives.

In order to cure the information disconnect, people need to embrace technologies that allow them to share their insights and collaborate in real time. Today those insights are mostly private – held in private spreadsheets, Word documents or yellow sticky notes – meaning that there's no effective way to share information through a single technology interface. Instead, everyone brings his or her private data and analysis to the cross-company meeting so it can get sorted out in person. Talk about inefficient! For organizations to be successful, this information needs to be captured, shared and managed to really move the needle on real-time collaboration.

Bringing this information – made of structured, unstructured and semistructured data – together is really puzzling, but organizations are already coming up with exciting ways to do it. In the manufacturing world, real-time demand systems allow companies to amalgamate information all the way from the customer's initial request, through the ordering and manufacturing process, all the way to final delivery. This affects every department within the organization, but we tend to still think of the whole process as a glorified supply-chain approach. Effective companies are already building systems and processes that draw information in many forms and from many sources to produce a magical, well-oiled solution that converts mountains of disparate information into successful outcomes.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access