When we speak about integration of data, the majority of the time we are looking at the world of structured data. The data of concern comes from the entities' corporate transactional and operational systems. These systems are made up of databases that contain hard and fast data that is stored and translated to facilitate retrieval and the twisting and turning of data to find patterns, help decipher business intelligence and deal with the day-to-day operational and transactional world of carrying out normal business.
There is little doubt that today's regulatory requirements have created an absolute need for accuracy in regulatory reporting. The business intelligence (BI)/business performance management (BPM) toolsets have responded to facilitate such activities. The transactional numbers-based systems lend themselves well to such requirements. Yet, if you take a complete view of the corporate world, consider the volumes of non-traditional or "unstructured" data that exist within the enterprise's back-end systems. Consider the volumes of information that exist in corporate e-mails, Excel spreadsheets, Word documents and hard-copy text documents. Recent industry experts estimate that the structured world of data comprises only about 20 percent of the actual total data volume of a corporation, while the unstructured informational volume sits at around 80 percent. If you examine what this means, organizations are making critical production and business decisions based on only 20 percent of the available corporate information.
Consider that you are trying to gain insight into your customer base. The structured world tells you the number of items purchased and the percentage of sales by customer. It may also provide the number of customer complaints and basic feedback numbers. What is doesn't give you is the detail about the customer complaints, negative customer feedback, customer opinions or other data contained in the internal memos, e-mails and meeting minutes scattered across your enterprise. Yet few would deny that such information would be crucial in any situation involving decisions about which customers to deal with or how to deal with the customer base as a whole. Until now, organizations have made crude attempts at capturing such information, but it has been a tremendous technological effort since current systems depend on digitizing information and tagging keywords so that it can be analyzed.
The barriers to integration of the structured and unstructured worlds of data are many. Organizational culture has tended to ignore the unstructured data, trying instead to refine and focus as much as possible on the world of numbers derived from the transactional systems. The increased use of business intelligence tools is a tribute to the need to try and discover the critical patterns and the need for analysis of historical information in an effort to have the right product at the right time for the right audience. Technology also has lagged, maintaining an emphasis on the structured data. The newest and latest hardware and database technologies are tributes to technology's efforts to crunch numbers and produce volumes of reports. More important, business practices and history have proven that until now, business can move forward without looking at the volumes of unstructured information and get by with making approximations and estimates about what that data might reveal. As we move forward in a much more robust and diverse economy, this is becoming more of a critical factor.
To date, the infrastructures built don't truly support the integration of the structured and unstructured information. There have always been silos of technology built within organizations to deal with the information coming from the individual systems, and the majority of the work for analysis has focused on the structured environment because it was easy and low hanging fruit. It is becoming more and more critical in today's business environment to try and assimilate the unstructured information since the secrets and richness of the information contained within can be critical to making the right decisions.
Several issues surround the inclusion of unstructured information into the corporate infrastructure and decision-making processes. These issues include the need for new technologies and, most importantly, enterprises must revise cultural awareness of the mass of unused information. Unstructured data is a vital part of the new economy, which contains the missing pieces of why the analysis done on the structured data sometimes yields questionable results. Many decisions need to be made regarding how to store and retrieve such information and how to best integrate it into the world of the structured information.
Additionally, BI and BPM tools need to be changed to allow the incorporation of non-numerically coded information into complex business analysis. Interestingly enough, many things will need to change in order to mine this rich source of corporate information. In the interim, it is likely that tools will soon be developed that will create a quantitative version of the unstructured world, but as the process matures, existing technologies will need to change to properly integrate the unstructured information and make it available for analysis.
Step one of the process is recognition of the value of the information. Today more and more enterprises are realizing the richness of the information contained in the unstructured data contained within their hard-copy files and back-office systems. They have come to accept that the unstructured information is required to complete the total picture. With that realization, the technologies will be developed to automate the process of merging the unstructured with the structured databases. Whether the answer is to create new types of data warehouse structures that can be tied to the structured world or something else remains a key business issue that must be solved by the data integration tools of the future. The answers to the how will be trivial when compared to the answers that will come forward when the structured and unstructured worlds are integrated.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access