Continue in 2 seconds

Data Mining as a Service: The Prediction is Not in the Box

  • July 01 2007, 1:00am EDT

Why were there so many failed enterprise customer relationship management (CRM) implementations? Everyone from executive management teams to database administrators have their own point of view on where the failure occurred: upper management didn't buy in; the software promised to do more than it could; the implementation took too long; the hidden costs were too high; and countless other reasons.

As with everything in life, we try to learn from our past mistakes, and we hope that by doing so, history will not repeat itself. The same is true with the recent surge of investment in data mining (also known as predictive analytics). The good news is that many companies are getting it right this time around - they have learned from their CRM software purchase mistakes and are investing in predictive analytics as a service. Unfortunately, for every company that is getting it right, there are many companies continuing to look for the easy, quick fix by purchasing out-of-the-box software with hopes that it's going to magically save the day.

The key challenge that packaged predictive analytics software has not been able to crack is how to extract knowledge from data quickly and put it into the hands of marketers to make better, more informed decisions. Although significant progress has been made over the last three decades by academic researchers in database technology, statistics and machine learning to improve the techniques for detecting patterns in large data sets, the use of predictive analytics still requires an incredible amount of expertise. This expertise is sorely needed to deliver accurate, understandable and actionable analysis.

Delivering predictive analytics is not a trivial exercise. It requires the skills of being able to map the marketing goals to the appropriate predictive algorithms, perform data hygiene and transformations, build models and test the results. Moreover, implementing predictive analytics requires the combination of three distinct skill sets: database technology, data mining and marketing domain knowledge. As a result, predictive analytic services go well beyond the traditional difficult-to-use, off-the-shelf statistical software packages, which have fallen short on delivering the analysis marketers need for highly effective target marketing, sales and inventory forecasting and retention modeling.

Before we get ahead of ourselves and talk about selecting a service, marketers must first ensure that the right team is in place. A successful predictive modeling project must be driven with the help of executive support to ensure the required data sources can be obtained to drive accurate and effective models. Then, in addition to securing leadership, it is essential to partner with a predictive analytics service provider with data gurus who understand in detail the relevant data sources for the project, statistical modeling experts who can run predictive algorithms, and database and systems integration experts who can design the modeling database as well as automate the process of scoring when new data is received.

Once the team is in place and leadership is on board, the best approach for a predictive analytics implementation follows a six-step process (see Figure 1) to ensure the business requirements are aligned for success, the right data is accessible and the results of the model will drive return on investment.

Figure 1: Steps to Ensure Business Requirements for Success

1. Define project goals and success criteria. Before any data collection or model building, it is essential to start with the basics. First, define the goal of the predictive modeling project, whether it is to boost customer acquisition, increase retention or improve customer satisfaction.

Next, determine how you are going to measure the project's success, which may be directly related to the impact of the predictive model (e.g., increase customer acquisition by 10 percent). Finally, define the outputs of the model (e.g., Excel reports, XML interface to integrate with a CRM system) and determine how frequently the model should be refreshed.

For example, if you are flagging customers likely to churn, it would be beneficial to have the model refreshed daily as customer data is generated.

2. Identify resources required for success. This phase involves the design and review for identifying the best predictive modeling approach to reach the project goals, including the required data sources and potential analytic approaches. Essentially, this phase is focused on mapping the business requirements to the data sources collected from purchases, subscriptions, customer behavior and any additional data necessary to support the goals.

The most important question to ask is: Does the data collected support the predictive model needed to achieve our project's goals and success criteria?

The result of the first two phases should be a clearly defined project roadmap, which includes detailed descriptions of the data sources, timeline for acquiring and integrating data, potential modeling approaches, the automation plan and ROI measurement.

3. Format and integrate data. Once the analytics plan is in place, the next phase is acquisition, analysis and cleansing of the data. The data gurus will need to work closely with the statistical modelers in this phase to audit the data and identify transformations required to support optimal predictive modeling.

Having clean data sources to support data mining is paramount to success. Customer data tends to be collected in disparate silos and frequently contains incomplete geo-demographic data, misspellings and out-of-range values.

There is no magic bullet for data cleansing other than close analysis of the data to ensure it meets the expectations outlined in the requirements. In this phase, statistical modelers can achieve first insight into the data and begin to share initial results with the project leadership.

4. Design and develop models. In this phase, the actual predictive modeling begins. It is strongly recommended to take an iterative approach to building models, regularly sharing results with key project stakeholders to ensure the analyses are meeting project goals. It is also important to choose the right algorithms for the application; knowing how to tune these algorithms for optimal performance will help maximize the accuracy and performance of your data mining models.

Also, consider algorithm performance in this phase as, depending on the automation processing requirements, it may be necessary to trade increases in accuracy for a boost in performance. Various model evaluation techniques can help the team measure the accuracy and efficacy of the model, the most common being a technique called cross-fold validation.

5. Validate and review models. In this phase, identify the best-scoring data mining model(s) and run tests to ensure it is accurate over a larger set of customer data. It's essential to pretest your models with a subset of your customer base to ensure the models are properly scoring and the system is running in a timely manner. As results and project goals are reviewed, the team may also identify organization-specific business logic to be added on top of the models.

6. Deploy and test analysis. The core value of most predictive analytics applications is automating the process of updating the models continuously with new customer data (e.g., to identify what customers are most likely to purchase). Along these lines is the reusability of the models, or the ability to leverage the process for future data mining goals.

In this final phase, you are ready to put the predictive model(s) into production and to reap the benefits of rescoring fresh customer and sales data collected.

The automation plan designed in step two can be carried out by the database and system architects. In addition, the output of the models, whether a report or integration with an existing information system, should be finalized, tested and deployed.

The automation of data mining models can be made efficient through the use of technology solutions that integrate data import and load functions, relational databases and predictive algorithms. Rather than a piecemeal approach, an integrated solution offers a consistent platform for the data analysts, statistical modelers and database developers to collaborate.

The steps outlined represent tried-and-true methodologies for delivering predictive analytics as a service. Data mining holds great promise for marketers to increase customer acquisition, improve retention and maintain customer satisfaction; however, success is dependent on following this proven process.

Embracing predictive analytics as a service improves a marketer's likelihood of success. By having the right people on board, ensuring clean data is accessible, thoroughly testing the data mining results and automating the analysis process will enable you to start using predictive analytics effectively in your organization. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access