Today, we take the concept of "data management" for granted an entire industry revolves around it. Data management has two facets: management of operational data by systems that process transactions (often in real time) and the management and analysis of historical data by decision support systems that provide business users with insight.As decision support has matured, it has become clear that business users do not want massive volumes of data, but they are interested in the patterns and trends buried within data. These patterns need to be accessed, manipulated and managed, just as data elements are managed. This article introduces the concept of "pattern management" and discuss how it is distinct from data management, but should complement it for best results.
Pattern management systems deal with patterns, just as data management systems deal with data. They require distinct repositories and query languages, just as languages have been developed for data management. In pattern-oriented systems, patterns are treated as first-class elements of languages and repositories. The tools of data management (e.g., SQL) were simply not designed for dealing with patterns. See Figure 1.
Pattern management is not knowledge management, data mining or the construction of a knowledge-based system. Knowledge is based on patterns known to humans often a small fraction of the patterns that a database implicitly contains. Data mining is a process that precedes pattern management with discovery feeding a pattern repository that is then managed. Pattern management thus deals with patterns after they have been discovered by data mining.
As data warehouses grow, the need for pattern management becomes paramount freeing the business user from the burden of dealing with data. With pattern management, the business user deals with patterns as the basic tokens of the information system, moving far beyond the rudimentary facilities provided by data management systems.
Data versus Patterns
Data is rough, patterns are refined. A pattern expresses relationships between data items, but not the data. There are several classes of patterns, including influence patterns (often reflecting probabilities or likelihood) as well as affinity patterns that deal with associations (e.g., market-basket patterns) or comparative patterns that point out differences among data sets. Each pattern class has specific rules of inference for the manipulation of patterns.
As a simple analogy, consider data as grapes and patterns of knowledge as wine. Data mining is then like the wine-making process. While a data repository is a storage facility for grapes, a pattern repository is like a wine cellar. Data mining tools are then like wine-making equipment. Although users can make their own wine by getting grapes from the warehouse, this takes both time and know-how and naturally most business users prefer bottled wine. Note that with pattern management, data mining still takes place behind the scenes, but the business user is unaware of it.
Data Analysis versus Pattern Management
Pattern management follows data analysis and, therefore, is distinct from it. To distill information from a database, we obviously need to perform analysis at some time. The key question is: "When?" In other words, does the analysis takes place at the time the user needs the information or is it done beforehand, with the knowledge ready to access? Traditionally, data mining analyses were performed upon user request. The pattern management approach rescues users from delayed analyses by pre-mining refined knowledge. Hence, there are two distinct paradigms for empowering users with knowledge. With data analysis, users operate on data to discover information. This paradigm relies on the "analysis-on-demand" approach (i.e., when a user wants knowledge, analysis is performed). With pattern management, the analysis is automatically done beforehand, refined patterns are pre-generated and users get knowledge when needed, on demand.
The pattern management paradigm provides a multitude of benefits to the business user. Business users without technical know-how can access knowledge without training. They just click on an icon from within a Web browser. When a user requests knowledge, no analysis is needed and follow-up questions are answered without delay. Data mining on a very large database may take time, but pattern look-up is fast. And, because patterns are not recomputed each time for each user, the overall system efficiency is much higher. Computations take place once, and users access the refined knowledge again and again with ease. And, because sampling and extract files are avoided, the discovered patterns correspond to the entire database and have high accuracy, resulting in better decisions.
Moreover, because patterns are stored in a single repository, all users get similar answers, rather than relying on fragmented analyses. This is in contrast to the data analysis paradigm where different users may draw different conclusions from the same data. Pattern management avoids the probability of 100 users getting 100 different answers from the same data, because now corporate knowledge is centralized.
Components of Pattern Management
To deal with patterns, we need to collect, store, manipulate, access and visualize them. We need repositories, query languages and systems to deal with refined patterns rather than raw data. Each of these has an equivalent in the data management world, as shown in Figure 2.
|Data Management||Pattern Management|
|Data Collection||Data Mining|
|Relational Algebra||Pattern Algebra|
|Data Visualization||Pattern Visualization|
|Figure 2: Data Management vs. Pattern Management|
The concept of a data warehouse was championed in the 1980s as a repository for corporate data elements. The idea was to create a central storage facility where everyone in the corporation could go and get "data" on demand, whenever they needed it. And, the central repository would help increase corporate data quality and consistency because everyone obtained data from a single source. This idea has now achieved worldwide acceptance, and almost every Fortune 500 company has several data warehouse projects. Similarly, a pattern repository can be used to hold historical patterns rather than historical data. With it, almost all the relevant patterns in the data are found beforehand and stored for use by business users such as marketing analysts, bank branch managers, store managers, etc. Business users get the interesting patterns of change every week or month or can query the patterns at will.
Patterns can, in fact, be represented as a set of "pattern-tables" within a traditional relational database. This solves several potential issues regarding user access rights, security control, multi-user access, etc. Obviously, we need a language to access and query the contents of a pattern repository. SQL may be considered an obvious first candidate for this; but when SQL was designed over 30 years ago, data mining was not a major issue. SQL was designed to access data stored in databases. We need pattern-oriented languages to access pattern repositories storing various types of exact and inexact patterns. Often, it is very hard to access these patterns with SQL.
Patterns cannot be conveniently queried in a direct way using a relational query language. Some patterns are not easily stored in a simple tabular format; and by just looking up influence factors in pattern tables, we may get incorrect results. We need a "pattern-kernel" that consistently manages and merges patterns.
While SQL relies on relational algebra, the pattern query uses "pattern algebra." The pattern query process should use SQL as part of its operation (i.e., pattern queries are decomposed into a set of related SQL queries, and then the results are recombined). However, business users just click on a graphical user interface to retrieve patterns on the intranet. They can begin to access knowledge immediately without lengthy training sessions or analytical know-how.
With pattern visualization, the user still performs analysis (e.g., visualizes affinity patterns), but the results delivered for the same level of computational effort are orders of magnitude better because the user now analyzes refined knowledge, not data. And now 100 different analysts will no longer get 100 different answers from the same data because there is a central knowledge repository.
A natural way of delivering pattern-based information to users on the Web is a document organized as a collection of information of different types text, data, graphs, etc. An explainable document looks like any other Web page at first, but does an incredible amount more by allowing users to dynamically obtain explanations that clarify, justify and substantiate the patterns presented within the document. Explainable documents are, in fact, a key element of machine-man systems (See "Machine-Man Interaction," DM Review, September 1997) allowing for the intelligent exchange of refined information between users and systems.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access