Data Mining Q&A with Dr. Kudyba
What exactly is data mining?
Kudyba: That's a great question to begin, simply because there continues to exist a great deal of uncertainty as to what data mining is. Some factions include techniques such as OLAP (online analytical processing) and simply data query and navigation as components of data mining; however, this is not accurate. What data mining really entails is the utilization of mathematical and statistical applications that process and analyze data. Mathematics refers to equations or algorithms that process data to discover patterns and relationships among variables. Statistics generally shed light on the robustness and validity of the relationships that exist in the data mining model. Leading methods of data mining include regression (logit regression), segmentation classification, neural networks, clustering and affinity analysis. All of the major data mining software vendors incorporate these into their platforms. There are some other quantitative-based techniques that could be included in the data mining family, but the ones I just mentioned are the real core components.
How does data mining differ from other business intelligence (BI) components, such as OLAP?
Kudyba: This question helps extend the answer to the previous question. Other components of BI generally involve data management, reporting and analytic activities. Data management and reporting can involve general query techniques to access data variables, ETL (extract, transform and load) tools that also enhance data access and storage, the creation of reports or fine-tuned dashboards containing customized performance indicators and, of course, the generation of OLAP cubes. Visualization is a component that enhances these various data and information reporting mechanisms. All of these technologies provide an essential platform to answer more retrospective questions about businesses, where retrospective refers to what happened to the various business processes of the organization. Data mining, on the other hand, also involves data management activities including variable and format selection and transformation and normalization of variables, which must be addressed before processing data with mining methodologies.
The real differentiator, however, is that data mining enables users/analysts to better answer why things may be happening in an organization and facilitates a prospective decision-making process. Prospective refers to the idea of acquiring a better understanding of what to expect in the future if a company takes certain strategic actions today - basically "what-if" and forecasting capabilities. The real power of data mining comes into play when there exists a need for a multivariate analysis or an analysis that involves the consideration of how a number of driver variables affect a performance measure or target variable. The power of mathematical and statistical methodologies enable the user to build models that consider all the variables in an integrated fashion, not just one variable against another.
One point I'd like to make is that despite the more limited nature of other BI components, the entire BI spectrum provides powerful decision support capabilities. Many times, you don't need the capabilities of data mining to answer questions because cubes, reports and dashboards do a great job.
That said, for what type of business applications is data mining best suited?
Kudyba: Data mining should be used when you have a question in your company about why an event is occurring or how some strategic activities impact a performance measure. Both of these questions generally involve the incorporation of a number of explanatory variables that influence or explain the movement in a target variable. Following are just some prominent business applications where data mining can provide a significant value-add.
Data mining can play a key role in determining likelihoods of:
- Default on a loan
- Purchase of a product or response to a promotion
- Response to a cross- or up-sell
- Cancellation of a policy
- Perpetration of fraud
It can also play a key role in performance-related activities such as:
- Determining the effectiveness of advertising, marketing and promotional campaigns (both e-commerce and brick-and-mortar)
- Determining call center effectiveness
- Evaluating employee performance
- Evaluating vendor/supplier reliability and performance
- Assessing manufacturing, production or operational efficiencies
As is evident, there are a number of business analyses that benefit from data mining methods. Data mining decreases the uncertainty as to what drives a particular business process by identifying patterns and relationships that exist among key variables. This information ultimately increases a decision-maker's understanding as to why events occur and what to expect in the future.
However, to put things in perspective once again, data mining is not appropriate for all business analytical needs. Very often, decision-makers merely need the power of OLAP cubes or well-designed reports to gain answers to business questions.
Can you measure the impact of incorporating data mining on bottom-line company profitability? In other words, can you measure the ROI in data mining?
Kudyba: Yes, within reason. Fortunately, it is not as difficult to measure data ming ROI as it is to measure the ROI of some other information technologies. In fact, depending on the business application, data mining can sometimes achieve returns in the quadruple digits (i.e., more than 1,000 percent). You may think that I'm spinning data mining here, but I'm not. An organization can often achieve significant cost savings or revenue increases by implementing data mining models, just with one business application.
How do you measure the return on data mining?
Kudyba: You can do so simply by comparing the cost and returns of a business process before you utilized mining with what you achieved for the same process after implementing a mining analysis. One of the best examples is to implement data mining to optimize marketing campaigns. Mining often enables users to augment campaigns by more accurately identifying target markets (e.g., consumers more likely to respond). Offering a hypothetical example, companies can still possibly achieve a 15,000 response-rate by mailing to 700,000 people instead of 1,000,000 people (calculate the decrease in expense if each piece of mail costs your company $1.00). With the ultra-competitive state of our economy demanding productivity at every corner, it will almost be a requirement for companies to do this in the near future.
Data mining has grown in popularity due to the increased focus on customer relationship management. The core to this strategic initiative is to better understand consumer behavior, which is incredibly complex. One of the best technologies to use to achieve this understanding of consumers is the multivariate-grounded methodology embedded in data mining.
What types of skills are required to conduct a data mining analysis?
Kudyba: Generally, there are two groups of individuals that implement data mining into their strategic initiatives. The first group consists of those grounded in quantitative and statistical techniques, including individuals with degrees in statistics, economics or mathematics who have done a good deal of quantitative modeling in the past. These are the best types of individuals to analyze data using the various data mining methodologies. Generally, the higher the academic degree the better, and a Ph.D. does really provide a value add. The other group consists of those with strategic business skills who also understand what data mining generally can do to enhance a particular business process. This group is extremely powerful because in order to generate truly value-added mining models, there must be input by business experts who provide key insights as to the data that is essential to describing business processes. They can also provide interpretive viewpoints on model results.
The most powerful skill set, I believe, involves business experience and an advanced degree in economics. Advanced economics includes the application of sophisticated quantitative modeling techniques and also provides a sound background of business theory (e.g., microeconomic theory). Regardless of all these skills, there remains one issue that is crucial to conducting accurate and effective data mining - that's the constant input into the process by business experts. Business experts have a true pulse of the variables that make particular business processes tick; they just don't know the quantitative relationships that may exist between those variables. That's where mining comes in.
How can individuals implement data mining within their companies?
Kudyba: The two routes to data mining implementation are outsourcing to consultants or investing in the technology and people to implement data mining in house. Depending on the size of your organization, you may want to first try consultants to see if you're getting the bang for your buck from some selected mining projects that address an important need. Then you can decide whether you want to maintain an outsourcing relationship or develop in-house capabilities. The key for the small to mid-sized companies is to identify a business process that needs to be enhanced, and then contact a data mining consultant for the project. (I classify small to mid-sized companies as companies with fewer than 6,000 employees.)
Larger firms (e.g., Fortune 500-sized), on the other hand, could take either route. The key for developing internal mining capabilities is high-level quantitative personnel (Ph.D.-level) who can access data and mine it, along with at least one business strategist to drive projects. The business strategist in this case is responsible for connecting business experts throughout the organization with mining activities and also packaging mining results into cohesive business plans, which are rolled out to corresponding cross-departmental management. The result should be a continuous increase in business process efficiency.
For additional data mining advice, check out Kudyba's book, Managing Data Mining: Advice from Experts, or visit the Null Sigma Web site at www.nullsigma.com.