The widespread availability of Web documents, powerful search tools and enterprise information portals (EIPs) is leading business users today to demand more from their data warehousing implementations. They now, more than ever, expect their decision support systems to integrate and provide all of the information necessary to make the right decisions fast, regardless of the data's source and form. To effectively meet this demand, today's data warehouses must be able to deliver information that combines both qualitative and quantitative data.

In this article, the term qualitative data refers to information that is unstructured (not numerical), usually stored in document or multimedia formats. This type of data is usually treated as objects within an information architecture. The term quantitative data refers to the information an organization has about numerically measurable results. This type of data, sourced from a transactional system, usually resides in a data warehouse.

To satisfy the needs of end users, data warehousing project managers are increasingly being asked to integrate these two forms of data. How do you link the qualitative data in a document-based system with the quantitative data stored in the data warehouse?

Document Management and Data Warehousing

How and why document management technology is employed depends on an organization's internal and external reporting requirements. Some industries (e.g., pharmaceutical, energy, utilities, etc.) have a requirement to accumulate and produce complex documentation that supports product development and regulatory processes. These industries require sophisticated, robust document management technology. Other industries (e.g., CPG, retail, etc.) have less complex requirements and can afford to use simpler document management solutions.

Commercially available document management software provides a repository that includes version and security controls, along with different levels of workflow and management support that stores documents throughout their life cycle. In this sense, a document management system has objectives and functionality similar to that of a data warehouse.

Organizations currently using a document management system can integrate data stored in that system with data from the organization's data warehouse and deliver it to users for better, more informed business decisions. For example, an organization could offer users a consolidated view of information about a particular activity with a specific customer. Another example is being able to access and review a copy of a customer's original invoice online when this individual calls with a billing question. One of the more feasible ways of accomplishing this is by using a commercial portal software product. Using portal technology, documents, analyses and business reports related to a common theme or topic can be presented to the user in an aggregated, browse-able view combined on a single Web page. Portal products, which manage both meta data and full text, can aggregate information from many sources into common categories, provide a rich context through the meta data and link the content to indices for both full text and individual attributes.

Linking Data

To effectively link qualitative data from one or more document management systems with quantitative data from a data warehouse, an organization must develop an information model that encompasses meta data of many types and identifies how this data is related. This information model must:

  • Define those entities and attributes that are critical to the business.
  • Provide the framework for understanding key document types and attributes in the document management system and their relationship to the entities and attributes defined in the previous step.

For example, if generating formal proposals were a critical process within the sales function of an organization, "proposal" would be an important document type, regardless of what form or format it assumes. This document type will then have a standard set of attributes stored as meta data. Defining these document types and standard attributes is inherent to the design and functionality of document management solutions.
Some of these document type attributes will be the same as the attributes of entities within the quantitative data in the data warehouse. For example, "customer" may be an attribute in both the data warehouse and the document management system and have a value of "John A. Smith" in a given instance. By using this common attribute and its value, the qualitative and quantitative data from both systems can be associated. Another example is creating an index of these attributes which can then be used to integrate the qualitative and quantitative content into a single presentation format (such as a Web page).

The document management system will also contain attributes for managing the data within the system (e.g., accession and expiration dates, file sizes, application format and others) as well as for describing the content. Using adequate, descriptive meta data can mitigate the need for users to review multiple documents, saving time and reducing network traffic.

Although some corporate portal software tools can map attributes with different names to a common attribute model, the legal values for the attributes are not easily normalized.

Meta data planning and creation ­ augmentation and meta data mapping ­ is an important function of a portal solution that brings together content from both the document management and data warehousing system.

External Data

In addition to internally maintained documents, an organization may access unstructured data that is available from external sources. This data may come from Internet sources, partners, e-mail and/or collaborative tools.

Third-party content from Internet sources has become especially important to many business users. In some situations of an integrated delivery solution, it is critical that users are made aware of this information as soon as it is posted on the Web. To do this, your integrated delivery system must automate the identification, categorization and presentation of this information. Some of the leading commercially available enterprise information portals have the ability to do this.

As companies' experience with data warehousing matures and the demand for timely, relevant information intensifies, we will continue to see an increased need for qualitative as well as quantitative information. By identifying and mapping common data models used for document management and data warehousing systems and implementing a custom interface or portal software to deliver the information to business users,an organization can have an integrated and improved decision support solution.

Linda Eiland Clark, a principal consultant and knowledge management leader at PricewaterhouseCoopers, contributed to this month's column.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access