The world of DSS and data warehousing is full of contradiction ­ at the design level, the implementation level and even the end-user execution level. Some transactions operate at the detail level, some at the summary level. Some queries execute very quickly while some take hours. Some queries are submitted regularly and other queries are submitted only once. How is one to make sense out of decision support and data warehousing? The key to understanding the data warehouse, DSS and its contradictions is to understand the many different and diverse audiences that make up the DSS community of users.

Several years ago I observed that there were farmers and explorers in the DSS community. While viewing DSS and data warehousing from different perspectives, many of the seeming contradictions were cleared up. Then, a little later, while working with Katherine Glassey of Brio Technology, Katherine introduced me to tourists. The theme of tourists/farmers/explorers played well and indeed explained many more aspects of DSS. But just recently Michael Berry, co-author (with Gordon S. Linhoff) of the leading book on data mining, Data Mining Techniques for Marketing, Sales and Customer Support, has introduced me to another character in decision support land, the "data miner."

I had always assumed that the explorer and the data miner were one and the same. But no. According to Michael, there are some very real and important differences between the miner and the explorer. Let's explore those differences and similarities in the context of the full community of DSS users.

There are four general classifications of users of DSSs: tourists, farmers, explorers and data miners.

Tourists are people who know how to cover a breadth of material quickly but who have little depth. Tourists know how to find things.

Farmers are those people who do the same activity repeatedly, except on different data. Farmers know what they want before they set out to execute a query. Farmers operate in a very predictable manner. Farmers execute the same query repeatedly, against very small amounts of data. Farmers expect good performance for their queries. Many knowledge workers are farmers, typically in finance and accounting.

Explorers are people who do not know what they want. Explorers are people who do "out-of-the-box" thinking. Explorers operate on intuition. Explorers create huge queries, looking at much detail and history. Response time for explorers may range into multiple days. Explorers look at data one way and then another. Then they pass on to other data. Explorers often find nothing. But when an explorer finds something, the results can be spectacular. Many actuaries and process control engineers are explorers.

Data miners are people who methodically scan data ­ usually large amounts of data at a detailed level ­ looking for confirmation of a hypothesis or perhaps even for a new hypothesis. Data miners look for suspected patterns. Once having found the pattern, the data miner tries to explain the pattern, in both the technical sense and the business sense. Once the pattern is discovered and explained, it is then open to exploitation. The business potential is tremendous.

Figure 1: DSS User Similarities and Differences
  Farmers Explorers Tourists Miners
Transaction Type/Size Few rows of data Many, many rows Perhaps no rows of data at all Many, many rows
Probability of Success Very high probability Low probability High probability Moderate probability
Base of Data/ Data Structure Star join, where requirements shape the design Highly normalized, where data structure can be changed rapidly Index, catalog and meta data structures which point to actual data Very highly denormalized data that often is preconditioned for analysis
Requirements Known Requirements are known before execution Requirements are unknown prior to execution Requirements, in many cases, are not known prior to search Requirements are known prior to execution
Predictability Highly predictable Highly unpredictable Unpredictable, heuristic Reasonably predictable
Detail Operates on summary and aggregated data Operates on detail almost exclusively Operates on indexes, catalogs and meta data ­ detail level is not applicable Thrives on detail
History Uses limited amounts of history Requires as much history as he/she can get Not applicable Thrives on history
Query Nature Prepackaged is optimal Totally heuristic Very heuristic Heuristic from query to query, but prepackaged within a single query

There is an important connection between the explorer and the data miner. Even though they are different, they are nevertheless related. In a sense, the explorer precedes the data miner and sets the stage for data mining success. The explorer does the work necessary to the identification and creation of the hypothesis. The data miner then takes the hypothesis and confirms or denies it and determines the strength of the hypothesis. (Some hypotheses are true but weak, other hypotheses are true and strong, yet other hypotheses are partially true, and so forth.)

There is some degree of similarity between the analytical job of the explorer and the data miner. But there are some significant differences as well. Figure 1 identifies some of these similarities and differences.

The parameters of measurement shown in Figure 1 deserve an explanation. Some of the parameters are very self-explanatory.

Transaction type is a general description of what a query looks like.

Transaction size refers to how many rows of data a transaction looks at.

Probability of success refers to the probability of a given query finding what is being sought. Note that in some cases a query will not be successful. In this case, there will be no outcome as a result of the query.

Base of data/data structure refers to the optimal data structure that the user operates on.

Requirements known before execution refers to whether the end user knows what to expect before the query is submitted.

Predictability refers to the work pattern that the end user has ­ are queries submitted with a pattern of regularity or irregularity?

Detail refers to the level of detail that the analyst operates on.

History refers to the amount of historical detail that is commonly important to the end user. In some cases history is important; in other cases history is not important at all.

Query nature refers to whether the query is prepackaged or heuristic.

These parameters of the environment bring to light some of the important differences and similarities of the classifications of DSS end users.

In particular, the similarities and differences between the explorer and the data miner are of interest. The data miner and the explorer are very similar (if not identical) when it comes to transaction type, transaction size, probability of success for a given transaction, what is found upon successful completion of transaction, detail and history. Because of these many similarities, it is easy to see how the explorer is occasionally confused for the data miner.

But there are important differences. The explorer operates on the same base of data, whereas the data miner is constantly changing the base of data on which he/she operates. The explorer operates on a base of normalized data while the miner operates on a base of very flat, highly denormalized data. The explorer has no idea what to expect prior to the execution of the query. The data miner does have a good idea of what to expect before the query is submitted.

Understanding the nature of the work being done and the mind-set of the audience doing the work provides a basis for understanding ­ in a larger sense ­ the needs and the requirements of the world of DSS.

This classification of DSS end users ­ starts to provide a framework from which the DSS designer and architect can begin to make sense out of the world of DSS. The framework, in turn, provides a basis for sorting through the many apparent contradictions found in DSS processing and design. Without this road map to the users of the DSS environment, there can be no comprehensive framework for DSS processing.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access