Data mining has long been a way to attain high business value from corporate data. As the means of automating discovery to explore and identify new business insight, it stands alone as an access method. Interactive query or OLAP presents the measures of the business organized around its logical dimensions. Hierarchies in the dimensions allow for organized grouping and lead to drilling up and down in the data to find what you're looking for in a manual discovery process. Data mining goes further.

Much of data mining has been relegated to the domain of a special breed of experts, often holding Ph.D.s in statistics, mathematics or some scientific discipline. The mining process currently deployed in many organizations is not only time-consuming due to the challenge of the tools and the semantic gap between the business user and the statisticians, it is also noniterative in nature. Discovered nuggets are only selectively interesting and actionable. Mining tools that are interactive, visual, understandable, well-performing and work directly on the data warehouse/mart of the organization could be used by front-line workers for immediate and lasting business benefit.

The techniques deployed in market-leading mining tools are generally well beyond the understanding of the average business analyst or knowledge worker. This is because the tools were designed for expert statisticians involved in the detailed science of predictive modeling. If this advanced level of analysis is reserved for the few instead of the masses, the full value of data mining in the organization cannot be realized. In many cases, data mining tool complexity actually prohibits overall organizational value from mining.

There are, however, numerous accessible mining techniques more effective than most simply because they are used by so many within an organization. For example, with little investment, some algorithms deployed in the latest Microsoft SQL Server can draw attention to significant anomalies that deserve further investigation.

If a retailer knew, based on mining, that within a cluster of high-spending block groups one was severely underperforming on a per-capita basis, that retailer might overmarket to that group to see what kind of potential it had. The retailer might look for statistically deviant characteristics of that group, or they might give up on marketing to it.

If a retailer knew, based on mining, what combination of dimensions to sales (at any level of any dimensional hierarchy such as geography, store, date and promotion) yields the five largest and most profitable values (for example, if it knew that vegetables at Store 123 in February 2006 was the top combination of product category-store-month in the past three years), it would want to delve in and understand why, then try to repeat that for other product categories in other stores in other months.

A manufacturing company could ascertain patterns in product improvement opportunities and focus its development on systemic fixes to problem areas. The company could mine a large swath of data such as customer satisfaction, market research, call center, manufacturing and warranty repair information. This solution could enable the company to cut reaction time to field issues, increase service event satisfaction and help eliminate service and warranty expenses.

A store could use mining results to physically colocate products with purpose. Products with a strong affinity can be placed physically apart, creating a walking route past those products with a weaker affinity, creating the temptation to buy them. Products with a weaker affinity can be placed physically nearer because it is unlikely a shopper would cross the store for the second item.

Mining could discover that if a customer's yearly income is $10 - $30k, then that customer's education is probably "partial high school." This rule could have significance to marketing if it was determined that the "partial high school" customer profile had a significant profit potential. Without the demographic information on education level, but with the demographics related to incomes, marketing can use the knowledge of the affinity (predictability) between income and education to indirectly target "partial high school" customer on the basis of income rather than education level.

Likewise, assume an automotive retailer is missing automobile preference for a set of newly acquired third-party prospect data, but it has age and gender. By mining that data, it could come up with a high-probability rule that if customer is age 35 and that customer gender is female, then the automobile preference is SUV. You can then impute that preference onto those customers when preparing promotions for the SUV automobile preference profile.

As data exploitation matures in an organization, the usage grows from simple reporting to basic scorecarding and interactive OLAP environments where users interact with the access tool through drilling from summary to detail and vice versa. The presentation of the data is also subject to high personalization through slice-and-dice and other techniques as the environment matures. However, it is feasible today that data mining access be a near-term usage method as opposed to a long-term, mature method of utilizing company information.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access