A large part of the full enablement of a business intelligence (BI) program is bringing the program to the user community and reminding the community what data is available, current uses of the data, data sources of current data and plans for additions. Many times, the most valuable uses of BI will be realized by the user community as the community unearths new uses and new ways to leverage data already prepared for consumption in the BI environment. Having said that, what provisions are being made for knowledge-workers to increase their chances of a valuable new discovery? More value exists than is probably currently being exploited, and you must be proactive to capitalize on it.

Hopefully, you can justify your BI efforts with those users who can take the leap of faith and/or who can relate to your presentation of the vision. However, speaking to prospective BI users about what the environment will "theoretically" look like post-production and what the user interface will really be like does not work for many users. They need to see the real data with the real interface.

As you grow the data and as your public relations program cranks up the chalk talks and user groups, and as the current users take the word out to their peers, it is important to understand that if no mechanisms exist to enable new users to sift through large volumes of data to uncover anomalies and detect patterns, they may never be able to ante up these hidden uses for the warehouse.

Data mining has long been a means to attaining high business value from a warehouse. As the means of automating discovery to explore and identify new business insight, it stands alone as an access method. Interactive query or OLAP presents the measures of the business organized around their logical dimensions. Hierarchies in the dimensions allow for organized grouping and lead to drilling up and down in the data to find what you're looking for in a manual discovery process. For OLAP to be effective, you are required to know what you're looking for ahead of time. Mining, however, makes you aware of situations that may represent new market opportunities or business problems that have yet to surface to the level of notice through standard interaction methods. Said another way, mining can automate the discovery process and guide users to the proverbial nuggets.

However, much of data mining has been relegated to the domain of a special breed of expert, often holding a Ph.D. in statistics, mathematics or some scientific discipline. The mining process currently deployed in many organizations is not only time-consuming due to the challenge of the tools and the semantic gap between the front line and the statisticians, it is also noniterative in nature. Discovered nuggets flow from the miners to the front line and are only selectively interesting and actionable. The feedback loop is missing. It's similar to owning a luxury car but keeping it parked in the garage at all times.

What if the process is inverted and the front-line business personnel find the nuggets and the experts quantify them? This process would cause the productivity to increase – something most businesses can benefit from. Mining tools that are interactive, visual, understandable, well- performing and work directly on the data warehouse/mart of the organization could be used by front-line workers for immediate and lasting business benefit.

The advanced techniques deployed in a lot of mining programs are generally well beyond the understanding of the average business analyst or knowledge-worker. This is because the mining tools were generally designed for expert statisticians involved in the detailed science of predictive modeling. If this advanced level of analysis is reserved for the few, instead of the masses, the full enablement of the warehouse in the organization cannot be realized. If those whose analytical interests stay well within the complexity of computing sales commissions are shut out of mining, mining is not nearly as effective as it could be.

The dilemma appears to be how to get more business analysts involved in the discovery process when the discovery tools are too difficult for the average analyst to master. The answer is to separate the automated discovery elements of data mining from the necessarily complex process of model creation and validation. This functional separation permits the business analysts to be far more effective in finding and polishing nuggets, thus improving the quality of their initial hypotheses. Once they are satisfied that they have found something of business value, they can pass these prequalified ideas to the experts for their validation. The result is a more effective and efficient analytical process.

There are, however, numerous accessible mining techniques that are more effective than most, simply because they will be used by so many within an organization. These techniques can draw your attention to significant anomalies that deserve further investigation through interactive query. Through this leverage, companies will be able to refocus their limited statistical experts on the more challenging analytical problems.

These techniques include:

  • Deviation – Finding the dimensional value(s) of a business measure that contributes unexpectedly high or low to a business measure. "Expected" contribution to the measure comes from algorithms that look at the contribution of all dimension values to the measure and "guess" what the value would be if it was unknown.
  • Summary – Finding the values within each dimension level of each dimension that contribute to the highest value of the business measure. For example, what is the store-product-customer combination that has yielded the highest profit in the last year? Summary can also be thought of as "multidimensional top n" report.
  • Market Basket – Finding which events occur simultaneously. For example, which products are purchased frequently in a transaction?
  • Association – Discovering rules, usually between dimensions, that have a degree of confidence (that the generalization would actually be correct if you didn't know for sure) and a degree of support (the amount of data that backs the assumption). For example, discovering you can be 75-percent sure that Northeast customers will purchase with a check. This is supported by 45 percent of the data that was made available to the rule discovery.
  • Probability Rules – These rules will have a general format of: If X and Y then Z. For example, if customer age = 25 and customer gender = male, then movie preferences = action. The results include a probability value that is similar to confidence and a support rating as well.

In the next column, we'll step through some examples of how accessible data mining can be used successfully by the front line.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access