JUN 22, 2006 1:00am ET

Related Links

10 Sustainability Predictions for 2011
February 23, 2011
A Letter to Future Employees: Embrace Analytics
February 3, 2011
A Hunger for Risk
January 6, 2011

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

The Next Data Management Frontier: Unstructured Data

Print
Reprints
Email

This month's column is contributed by Patricia Cupoli.

At the April 2006 Wilshire Meta Data Conference/DAMA International Symposium, there were a number of presentations that dealt with metadata, ontologies (organization of knowledge and terms), semantics, controlled vocabularies and taxonomy/classification. You may ask why these topics typically associated with library and information science, document management, content management and knowledge management were presented - they do not seem typical for the data management professional. However, these types of presentations have been showing up more and more in the last several years.

Data management professionals are becoming more and more involved with a data area that is called unstructured data. This term includes objects in both hard and soft media such as emails, all types of text documents, graphic images, videos and Internet Web pages. These items cannot be stored in a database or spreadsheet columns and rows, but can be stored in a relational DBMS BLOB (binary large object) or in XML files. Yet most of the unstructured data has some type of structure (also known as semistructured data) which could provide metadata in adherence to a standard such as the Dublin Core (15 metadata elements in total to include title, author, description, etc.). This metadata could be stored in a relational database even if the object content is not in electronic format.

Why is unstructured data important to a company? It has been estimated that at least 80 percent of a company's data is unstructured and not easily accessible or found. In this age of Sarbanes-Oxley and other regulations, the overwhelming amount of unmanaged, unstructured data could increase a company's exposure. Business users want to browse and search across all types of data for such opportunities as understanding customer issues. Management often does not have the ability to make decisions based on analysis of both structured and unstructured data if unstructured data is not integrated into a data warehouse/business intelligence environment.

This growing area of data needs to be managed as a corporate asset to provide value. It has to be identified, captured, organized, and made accessible and sharable. These management processes should sound familiar to data management. This organization deals with the structured data world through the development/maintenance of data model structures and metadata associated with data models that give meaning and vocabulary, and has best practices of data standards and a governance process with data stewards. One structured data concept (e.g., employee entity) can have many expressions or types (e.g., management or staff, active or retired, etc.) that describe it.

Unstructured data deals with content semantics where one expression (e.g., foot) can have many different concepts associated with it (e.g., unit of measurement, part of a human or animal leg below the ankle joint, or the lower part of anything). A controlled vocabulary organizes content through a selected list of words and phrases used to tag units of information (either automatically or manually) so that they may be more easily retrieved by a search. There is usually a governance structure to keep the various types of controlled vocabularies current. The different types include the following:

  • list of equivalence relationships or synonyms (e.g., cat and feline, baby and infant, student and pupil);
  • taxonomy that shows hierarchical relationships of subject and topic metadata;
  • thesaurus that shows equivalence (synonym list), hierarchical (taxonomy), and associative (related terms) relationships; and
  • ontology that represents a collection of taxonomies and thesauri for knowledge representation.

Where should data management start with unstructured data? Most likely, there are other organizational groups in your company such as content or knowledge management, libraries, records management, or document management that a data management organization could collaborate with to raise awareness of the criticality of managing and integrating unstructured data for accessibility. There can be synergy between data management and these other organizations with regard to values for reference data and data architectures, metadata creation and definition, metadata topics for taxonomies, use of newer technologies that can handle all types of data, and governance (it may be the same subject matter experts) at both the enterprise and project (requirements gathering) levels. It is the integration of structured and unstructured data that is a challenge, especially if the unstructured data is in paper or other media. Eventually, the techniques of structured data management and data integration will converge with the techniques of the unstructured data world to help businesses overcome this challenge.

Patricia Cupoli, CCP, CDMP, CBIP, is the DAMA International ICCP Liaison, the DAMAi Project Manager for the Data Exam Development, ICCP Board President, and a past president of DAMA International, DAMA Chicago, and DAMA Philadelphia / Delaware Valley. She is the recipient of the 2006 DAMA International Professional Award. She may be reached at ICCP_Liaison@DAMA.org.

The Data Management Association International (DAMA International) is a global not-for-profit, vendor-independent association of data and information resource management professionals with chapters and members around the world. DAMA International is dedicated to advancing the concepts and practices of data and information resource management. Its primary purpose is to promote the understanding, development and practice of managing data and information as key enterprise resources. DAMA International produces premier Symposiums for data and information management professionals in the U.S., the UK and Australia. For more information visit www.dama.org.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.