Data Mining Q&A with Dr. Kudyba
Information Management Magazine, May 2004
What exactly is data mining?
Kudyba: That's a great question to begin, simply because there continues to exist a great deal of uncertainty as to what data mining is. Some factions include techniques such as OLAP (online analytical processing) and simply data query and navigation as components of data mining; however, this is not accurate. What data mining really entails is the utilization of mathematical and statistical applications that process and analyze data. Mathematics refers to equations or algorithms that process data to discover patterns and relationships among variables. Statistics generally shed light on the robustness and validity of the relationships that exist in the data mining model. Leading methods of data mining include regression (logit regression), segmentation classification, neural networks, clustering and affinity analysis. All of the major data mining software vendors incorporate these into their platforms. There are some other quantitative-based techniques that could be included in the data mining family, but the ones I just mentioned are the real core components.
How does data mining differ from other business intelligence (BI) components, such as OLAP?
Kudyba: This question helps extend the answer to the previous question. Other components of BI generally involve data management, reporting and analytic activities. Data management and reporting can involve general query techniques to access data variables, ETL (extract, transform and load) tools that also enhance data access and storage, the creation of reports or fine-tuned dashboards containing customized performance indicators and, of course, the generation of OLAP cubes. Visualization is a component that enhances these various data and information reporting mechanisms. All of these technologies provide an essential platform to answer more retrospective questions about businesses, where retrospective refers to what happened to the various business processes of the organization. Data mining, on the other hand, also involves data management activities including variable and format selection and transformation and normalization of variables, which must be addressed before processing data with mining methodologies.
The real differentiator, however, is that data mining enables users/analysts to better answer why things may be happening in an organization and facilitates a prospective decision-making process. Prospective refers to the idea of acquiring a better understanding of what to expect in the future if a company takes certain strategic actions today - basically "what-if" and forecasting capabilities. The real power of data mining comes into play when there exists a need for a multivariate analysis or an analysis that involves the consideration of how a number of driver variables affect a performance measure or target variable. The power of mathematical and statistical methodologies enable the user to build models that consider all the variables in an integrated fashion, not just one variable against another.
Advertisement
One point I'd like to make is that despite the more limited nature of other BI components, the entire BI spectrum provides powerful decision support capabilities. Many times, you don't need the capabilities of data mining to answer questions because cubes, reports and dashboards do a great job.
That said, for what type of business applications is data mining best suited?
Kudyba: Data mining should be used when you have a question in your company about why an event is occurring or how some strategic activities impact a performance measure. Both of these questions generally involve the incorporation of a number of explanatory variables that influence or explain the movement in a target variable. Following are just some prominent business applications where data mining can provide a significant value-add.
Data mining can play a key role in determining likelihoods of:
- Default on a loan
- Purchase of a product or response to a promotion
- Response to a cross- or up-sell
- Cancellation of a policy
- Perpetration of fraud
It can also play a key role in performance-related activities such as:
- Determining the effectiveness of advertising, marketing and promotional campaigns (both e-commerce and brick-and-mortar)
- Pricing
- Determining call center effectiveness
- Evaluating employee performance
- Evaluating vendor/supplier reliability and performance
- Assessing manufacturing, production or operational efficiencies
As is evident, there are a number of business analyses that benefit from data mining methods. Data mining decreases the uncertainty as to what drives a particular business process by identifying patterns and relationships that exist among key variables. This information ultimately increases a decision-maker's understanding as to why events occur and what to expect in the future.
However, to put things in perspective once again, data mining is not appropriate for all business analytical needs. Very often, decision-makers merely need the power of OLAP cubes or well-designed reports to gain answers to business questions.
Can you measure the impact of incorporating data mining on bottom-line company profitability? In other words, can you measure the ROI in data mining?
Kudyba: Yes, within reason. Fortunately, it is not as difficult to measure data ming ROI as it is to measure the ROI of some other information technologies. In fact, depending on the business application, data mining can sometimes achieve returns in the quadruple digits (i.e., more than 1,000 percent). You may think that I'm spinning data mining here, but I'm not. An organization can often achieve significant cost savings or revenue increases by implementing data mining models, just with one business application.
How do you measure the return on data mining?
Kudyba: You can do so simply by comparing the cost and returns of a business process before you utilized mining with what you achieved for the same process after implementing a mining analysis. One of the best examples is to implement data mining to optimize marketing campaigns. Mining often enables users to augment campaigns by more accurately identifying target markets (e.g., consumers more likely to respond). Offering a hypothetical example, companies can still possibly achieve a 15,000 response-rate by mailing to 700,000 people instead of 1,000,000 people (calculate the decrease in expense if each piece of mail costs your company $1.00). With the ultra-competitive state of our economy demanding productivity at every corner, it will almost be a requirement for companies to do this in the near future.
Page 1 of 2.







