Why You Should Care About Data Mining
Organizations already have mountains of information, and more is coming every day from new data sources, more rapid updates and higher volume from traditional existing sources.
As the variety, velocity and volume of data spikes upward, it is increasingly difficult to draw conclusions from it simply by reporting on it or visualizing it for people. Yet we must draw conclusions from it; we must use it to drive more analytic decisions. If we are to do so, this data must be “mined” systematically for insight.
Data mining is an established approach with a long history in many industries. Data mining is an increasingly essential tool for information management professionals because it allows you to extract more meaning from your data. Because data mining works with large volumes of data it can be used to improve the quality of decisions your organization makes even as data volumes continue to grow.
Data mining uses mathematical techniques to extract meaning from data. It can be used to:
- Find rules or approaches that worked well in the past.
- Identify dependencies or relationships between things.
- Segment or classify customers based on how well they match something you care about.
- Group and cluster things that are similar to each other.
- Spot and identify anomalies buried in the data.
Data mining is also the basis for so-called predictive analytics. Predictive analytics involve applying mathematical techniques to historical data to build a predictive analytic model. Such a model predicts how likely something is to be true or the likely value or order of something. A predictive analytic model is created using many of the same techniques found in data mining but focuses those techniques on making predictions about what is likely to be true in the future. Instead of finding, for instance, dependencies that were true in historical data it aims to find dependencies that are likely to be true in the future. Instead of grouping customers based on historical similarities, customers are grouped based on the likelihood that will behave similarly in the future. Some regard data mining as the first step in predictive analytics, some use the terms data mining and predictive analytics as though they are synonymous. There is certainly a strong association between them.
Data mining techniques can be either directed or undirected. Directed techniques require a target -- a mission -- and analyze the data in that context, for example identifying prospects that look like currently profitable customers. Undirected techniques seek hidden or unknown patterns in the data. An example is to find groups of customers that are alike without any pre-defined sense of what alike means.
Applying data mining typically involves a three-step process:
- Integrating, cleaning and enhancing the data available ready for analysis. Linking customers to their transactions, categorizing fields as numbers or selections from a list and creating summary variables are common actions in this step.
- Applying algorithms to the data – either directed to a particular outcome or undirected to see what patterns exist.
- Validating results using data kept aside for such a purpose and ensuring that the results are not “over fitted” to the data used in the analysis. This allows the results to be applied to new data effectively.
Data mining techniques and the software that implements them need to handle large amounts of data effectively. In general, more data will result in better, more accurate answers. In addition, modern data mining workbenches provide an increasingly rich array of functions to automatically handle data preparation and algorithm tuning, making more and deeper analysis practical.
Data mining can be used in an exploratory fashion, helping you see what you might be able to do more effectively. Perhaps more usefully, you can use data mining results to improve decision-making. Decision-making involves more than just data mining, however. As authors Michael Berry and Gordon Linoff said in their classic book on data mining, “data mining suggests, businesses decide.”
Many different kinds of decisions can be improved using data mining. Data mining can be used to target marketing and sales by segmenting customers into small groups of similar customers. It can be used to improve cross-sell and up-sell decisions using market basket analysis to find products that sell well together. It can be used to uncover fraud using anomaly detection to find records that don’t look like the usual, non-fraudulent transactions being handled.
Depending on the kind of decision being improved there are different ways data mining could improve your decision-making. Combining data mining with effective visualizations can bring deeply buried patterns into focus. Visualizing sub-populations within your customer base, for example, might help you decide what new loyalty offers would appeal to those populations. Data mining can also act as a lens, focusing people’s attention on specific data or results that really matter. For instance, instead of simply reporting on all the products that sell together, data mining can drive a more focused list of the products that are most likely to sell well with specific target products.
Data mining can also be used to drive better decision-making in information systems. Using data mining to find appropriate business rules or otherwise integrating with business rules to drive decision management systems can result in analytic decisions being made on your behalf by your systems. A fraud detection system could use explicit policy rules as well as rules derived from anomaly detection to flag potential fraud.
Seeing patterns in data is something humans do very well. Increasingly effective visualization approaches, better displays and more powerful processors all help us extract more meaning from our data. Yet there are patterns in our data that are hidden – hidden due to their complexity, due to the volume of data involved or simply because we don’t have time to unearth them. Data mining techniques and technologies are ideal for finding these patterns and exposing them to decision-makers and to decision-making systems so that decisions can be improved.