For critical issues facing private and public sector executives today, data mining makes the difference. From customer relationship management to risk management to improved production on the factory floor to detecting fraud and abuse, more leading-edge organizations are discovering every day that data mining gives them the ability to proactively make changes to exceed their goals.

In the past few years, many businesses and public sector agencies have invested heavily in some combination of enterprise resource planning (ERP), supply chain management, sales force automation, data warehouse and reporting software. Wanting a better return on those investments, they are now wondering whether to invest in data mining.

On the surface, data mining seems to be a risky investment. Data mining jargon is thick, the math behind the scenes is mysterious and it seems to touch only a few people in the organization.

However, when you look a little closer, the risk isn't great. In fact, it is probably much more risky not to take the plunge into data mining. The investments in ERP, data warehousing, reporting, etc., were important but lacked leverage for three key reasons:

  • They primarily replaced and updated existing ways of getting things done ­ which means their value was only a marginal improvement in productivity.
  • Because so many organizations implemented them in more or less the same time period, they didn't offer a competitive advantage.
  • Because they only deal with past events, they cannot effectively predict changes or outcomes.

We are still in the early part of the curve of the inevitable widespread use of data mining. Today, industry leaders, innovative start-ups and progressive-thinking agencies are using data mining successfully to predict and change their futures. They are realizing not only a return on their investment in data mining, but a return on their other IT investments, especially their data warehouses.
Data mining is really about achieving your organization's goals, not about the math and the statistics. Data mining enables you to go beyond reporting and OLAP (online analytical processing) to learn not only what happened in your operations, but also why things happened. The results of data mining can easily be deployed to all the decision-makers in your organization, including "virtual" decision-makers such as your Web site and operational systems to improve decisions in real time.

The remainder of this article passes along some data mining strategies. You'll see you don't have to be Einstein to do data mining and that data mining can have widespread positive impact in your organization.

Don't wait to get started ­ the competition is only a mouse click away. Data mining is a journey, not a project. When you've addressed today's critical issues, new ones pop up due to changes in your market or the technology. It's very likely that your competitors are already employing data mining to better attract, cross-sell and retain customers. To start before it is too late ­ before your reports reveal you've lost key customers ­ you may need to outsource your initial data mining effort.

The critical questions are:

  • Is your staff skilled and experienced? Most organizations do not have many people on staff, in either line of business or IT roles, with much data mining experience. If your CIO tells you the staff is experienced in data mining because they've built a data warehouse or have implemented OLAP, you know you've met the chief "I-don't- really-get-it" officer. You need to outsource to get the data mining expertise you need to be successful.
  • Do you have the technology infrastructure? Data mining requires data. Do you have a clean, accessible marketing database or a data warehouse? If not, to get going quickly, you can outsource this activity.

As you start, plan ahead for growth. As you make choices about vendors, technology, etc., be sure to always consider scalability (the ability to work with very large data sets) and flexibility (the ability to apply the technology to a variety of situations).
Begin with the end in mind. Personal productivity guru Steven Covey's maxim applies to your data mining efforts. Don't be a hammer looking for a nail ­ there's no point in crunching a bunch of numbers or even gathering data in a data warehouse without first deciding what results you want to achieve.

The context for data mining is the issues critical for your organization's success. Start by tackling a project that is clearly linked to what you want to accomplish. Successful data mining initiatives typically start small, focusing on a critical organizational issue such as retaining customers longer. Review your organization's strategic plan and business plan. Is there an area in which you aren't making the hoped-for progress? Data mining can help you get things back on track.

Decision-makers tend to value data mining most favorably when they can take action based on the results. Typically, in organizations where data mining has really taken root, the first data mining project informed decision-makers on an important topic and had both a short time frame and clear deliverables.

Beginning with the end in mind includes defining measures which can drive improvement, delivering measures on which people are prepared to act, ensuring the measures are easy to communicate and understand, and building measures that really fit the problem

Focus on the "I," not the "T." Successful data mining project leaders retain a laser-like focus throughout the project. They focus on the people who will use the results of the project to make important decisions, the decision-makers. Starting in the planning phase, keep the emphasis on delivering actionable results, not on data storage or the techniques you'll use to generate the results.

To be successful, both the decision-makers (the line of business or public sector executives) and the IT organization must buy in to the project plan. In most successful data mining projects, the decision-maker is both the champion for and the leader of the project.

It is important to keep the decision- makers involved, even in the portions of the project led by IT. Too often, decision-makers and IT talk past each other ­ and don't discover the disconnection until effort has been wasted. Avoid these disconnects by over- communicating during all parts of the process, particularly on the first few projects. Another important part of focusing on the decision-maker is to set and manage expectations. Remember, the project should have clear, timed deliverables. Build a speedboat, not a battleship.

Unless there's a method, there's madness. Despite the promises of early data mining evangelists, data mining is not a silver bullet for decision making ­ you can't just push a button and expect useful results to appear. Successful data mining projects typically employ a formal and iterative process which guides the team step-by-step from selecting a critical issue through deployment of results. Fortunately, there is a tested, industry-standard approach to data mining projects. The cross-industry standard process for data mining (CRISP-DM) was developed with the cooperation of more than 100 companies including a mix of industry leaders, small consulting firms and academics. The consortium field- tested the methodology and modified it based on that experience. For more information about CRISP-DM, visit

There's a reason it's called data mining. Gold mining is a process for sifting through lots of ore to find valuable nuggets. Data mining is a process for sifting through lots of data to find information useful for decision making. If there's no gold in a particular mountain or stream, even the best gold miner won't strike it rich. Similarly, the data itself is critically important to data miners.

Once you've picked a problem to solve, you can start thinking about data. The data is so important that three of the 10 points in this article focus on data. Start at the highest level; think about the data that you need to gather from the perspective of the information you want to deliver. You'll want to capture the detail of data needed for everything from strategic analysis and tactical alerts.

Two high-level tips from successful data miners about data are:

  • Most often, the "unit of analysis" in a data mining project is a customer or constituent. In order to do analysis at that level, you probably need a unique ID by customer or constituent everywhere you capture and store data.
  • Make use of meta data ­ information about your data ­ wherever you can. Meta data includes simple information such as the source of the data to more advanced information such as when the data was last changed or whether the data has been imputed (a missing part of a record has been filled in based on information in other parts of the record).

Better data means better results. Better data is only one of a handful of ways you can improve your data mining results. Better data means comprehensive and more accurate analysis. Typically, it's far more valuable to include more variables or columns of data in the analytical process than it is to have more cases or records. How-ever, there's a tradeoff between using many variables and getting useful results quickly. Because data mining is a journey, successful data miners typically work with the data they have, get results, realize some ROI then add additional data over time to become even more effective.

The best analysis is done using three types of data:

Transaction data. Transaction data is very powerful ­ it tells you what the customer has actually done. Psycho-logists have proven past behavior is the best predictor of future behavior. The good news is most organizations have a great deal of transaction data ­ everything from prior purchases or donations to a record of which Web pages a person visited and how long they lingered on them.

Purchased data. When a customer or constituent interacts with you, they typically only tell you a subset of useful information. An entire industry exists to provide very useful supplemental data about the customer's current situation, including demographic and psychographic data.

Collected data.. Collected data offers a great opportunity for leverage in data mining. And, because it takes effort and skill to collect useful data, it offers a unique opportunity for competitive advantage. Collected data adds information about a customer's attitudes and opinions into the analysis phase ­ and results in better decision-making information. You can collect data about customer satisfaction levels, customer preferences, purchase intentions, share-of-wallet information, etc. Or, you can turn to professionals in the market research industry to collect it for you.

While reporting and OLAP are informative about past facts, only data mining can help you predict the future of your business.

OLAP  Data Mining 
What was the response rate to our mailing?   What is the profile of people who are likely to respond to future mailings?
 How many units of our new product did we sell to our existing customers?  Which existing customers are likely to buy our next new product?
 Who were my 10 best customers last year?  Which 10 customers offer me the greatest profit potential?
 Which customers didn't renew their policies last month?  Which customers are likely to switch to the competition in the next six months?
 Which customers defaulted on their loans?  Is this customer likely to be a good credit risk?
 What were sales by region last quarter?  What are expected sales by region next year?
 What percentage of the parts we produced yesterday are defective?  What can I do to improve throughput and reduce scrap?


It's still garbage in, garbage out. Some things never change. In the early days of computing, the phrase "garbage in, garbage out" was coined to reflect the reality that computing results are dramatically affected by the quality of data. Getting access to the right data, cleansing it and preparing it for analysis is typically the most time-consuming step of the data mining process. Don't fool yourself ­ 70 to 80 percent of time invested in data mining projects is typically used for data access, cleansing and preparation. So, plan ­ and set expectations ­ accordingly.

This work is often time-consuming and many people do not find it particularly exciting. In addition, dramatic increases in usefulness of end results can proceed from more analytically sophisticated approaches to data preparation. As a result, this step is often a good candidate for outsourcing.

Avoid the OLAP trap. Many vendors try to tell you they have "all you need" for data mining or effective online personalization. That's unlikely to be true. Successful data mining requires three families of analytical capabilities: reporting, classification and forecasting.

Why are all three types needed? Each capability delivers a different kind of information. Reports inform management of what has happened in the past. Reports (including OLAP) are popular because they are easy to understand and easy for IT to produce. However, if all you have is reporting, you can only see what has already taken place. You cannot predict what might happen later. It's as if you're trying to drive your car by only looking in the rearview mirror ­ the faster you need to go, the more risky it is.

Classification and forecasting are different from reporting because they enable you to gain more understanding about why things happen ­ and to make predictions about what is likely to happen under different scenarios. Armed with this information, you can make proactive changes in your organization and realize better results. Together, classification and forecasting are commonly known as predictive modeling. Classification methods put things into groups ­ for example, customers likely to spend more with your company or constituents likely to vote for a proposition. Typically, the classification process includes two steps: establishing the groups and determining (or predicting) group membership on a case-by-case basis. Forecasting methods deal with data where time is the critical measure. Examples of forecasting are sales by product and/or region over time and population growth over time.

It's not the purpose of this article to delve into the pros and cons of the various methods of predictive modeling. The key point here is contrary to what is often heard: algorithms do matter. Algorithms matter because, in the end, just any result won't do ­ to compete and win, you must have the best answer. Getting to the best answer involves powerful and comprehensive solutions which enable you to determine preferences of not only current customers, but also those for whom you have no purchase history. Just as a carpenter uses more than a hammer to build a house, a data miner uses more than one analytical method to get the best results.

Experienced data miners offer three suggestions with regard to data mining analytics:

  • Don't redo your existing reports without adding value. Typically, that means adding one of these three things: drill down, alternative views of the same information and prediction.
  • Always offer ad hoc capabilities. Canned reports are a great starting place, but they are just a starting place. In today's fast-paced world, ad hoc capabilities are a requirement.
  • Your data mining software should offer an easy way for you to incorporate your business knowledge. Software that doesn't let you make use of what you know about your business when building a model is simply not going to give you the best results.

Deployment is the key to data mining ROI. The ultimate goal of data warehouse ROI cannot be achieved without data mining, but truly successful data mining cannot be achieved without deployment. Deployment simply means getting the information, in a usable format, to the place where it is needed.

There are four types of deployment:

To decision-makers. Getting information into the hands of people who can effect change is key. Typically, deployment to decision- makers is done via intranets. Two ways people can "go one better" in this regard are offering "live" deployment (where the decision-maker can interact with the results) and supporting offline use of the information for travelers.

To "virtual" decision-makers. If a customer enters your store, they can receive personalized service from one of your employees. If a person comes to your Web store, you can still offer a personalized experience if you deploy predictive models. For example, based on a combination of what you know about a customer from historical transactions and purchased data and their actions at your Web site today, your virtual decision-maker can instantly offer different Web page content, alternative products, unique discounts, etc.

To operational systems. Another customer touchpoint is your call center. By prompting your call center representatives, deployment can enable the same type of personalization as in the Web scenario previously mentioned. In a manufacturing setting, a deployed model may take information coming from a production line and, based on that data, either send a message to a troubleshooter or even make adjustments without human intervention.

To databases. Interactions with customers today are many faceted. For example, a retailer may interact with the same customer via a storefront, the Web, a call center and a catalog. In order to keep information about that customer current, savvy organizations store all data in a centralized data warehouse. When a customer's profile is updated, the "score" (which indicates the customer's category) is deployed back to the database for use in future interactions. For example, recent purchase activities or updated demographic information cause a classification model to change the category into which a customer is classified.

Champions train so they can win the race. Most corporations today have a handful of heavy-duty model builders (data miners), a fair number knowledge workers (analysts) and lots of information consumers. In addition to the obvious returns from training the IT personnel and data mining software users, you are likely to see greater overall return from your data mining investment if you also educate the people who receive the information and use it to make decisions. While not everyone wants to be an analyst or a model builder, ongoing success in data mining typically requires some education and training for your information consumers.

The type of education and training needed varies by the individual's role in the process. Many people have paid their analytical dues in college by sitting through a statistics course ­ and volunteers to repeat that are few and far between. However, many benefit from a more practical refresher course that emphasizes analytical thinking, not analytical methods. The content can be delivered in a variety of ways, from a classroom setting to just-in-time computer-based tutorials.

Another key training consideration arises if you outsource your first data mining project to get started quickly. If you do, you probably want to train your IT staff to be ready to manage the system, make updates, etc., when you assume responsibility for it.

Mine Your Data

Today, data mining can make the difference in every industry and organization throughout the world. You can mine your data ­ and use the results to determine not only what your customers want, but to also predict what they will do.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access