The list of enterprises that have taken the leap across the data mining technological chasm continues to grow. Some have fallen into the abyss while others have landed safely on the other side. Our experience has shown that there are common critical success factors that distinguish the fortunate ones from their unlucky counterparts. We would like to share our observations in the hope that they will benefit both those organizations that are considering the jump and those that have failed in the past but are determined to make another attempt. Data mining, a.k.a. knowledge discovery from databases, remains a novel technology that holds great promise for enabling enterprises to solve their most pressing, persistent and pernicious problems. This promise, in and of itself, is compelling enough to cause many organizations to experiment with data mining. Many of these forays into data mining have taken on the flavor of the "pilot" or "proof of concept" data warehousing projects that were initiated several years ago. Unfortunately these data mining experiments have often produced the same lukewarm results. This parallel between data mining and early data warehousing efforts reveals itself in several ways.

Data mining is an evolving technology whose implementation in the future will become mandatory in order for business organizations to remain competitive. Data warehousing was perceived as providing a competitive advantage during its dawning. It has now become a competitive imperative. Data mining is still nascent, and yet many of its early adopters are seeking and realizing a competitive advantage. The reaction time between securing an advantage and simply meeting the status quo will vary from industry to industry; but rest assured that if you manufacture a product, sell a product or service, manage relationships or manage risk, someone in your market space is already attempting to use data mining.

The technical aspects of data mining are maturing, but the maturation of supporting processes and tools required to successfully deploy data mining are lagging. Data warehousing was initially all about the right platform, the best performing database management system and the most featured data access tools. Now it is mostly about efficiently and reliably loading quality data into the warehouse, dealing with data synchronization, deriving meaningful insights from that data and delivering those insights to the end user. In order for it to be successful, data warehousing had to evolve from a technology-driven concept to a solution-driven concept. Data mining must now do the same. To date, information technology consumers have been eager to employ this new technology without much regard to its incumbent processes and disciplines. Meanwhile, the data mining industry ­ which has been driven by very bright, academically inclined people ­ has focused on developing and deploying the hottest algorithms without a lot of thought as to how to use them to deliver business value. This technological feeding frenzy appears to be coming to an end as the data points for successful and failed data mining experiences accumulate and the need for data mining methodology becomes apparent.

Tool developers are responding to the process lag by marketing one-button data mining. When data warehousing encountered implementation challenges, information technology providers with the most to lose started packaging data marts in a box. It is obvious, perhaps painfully, that the only thing in the box is the software tools to build a data mart. Similarly, developers can repackage data mining tools, enhance their graphical user interface and automate some of their more esoteric aspects; but, in the end, it falls on the analyst to acquire clean, non-biased data to feed the tool, make dynamic selection of appropriate algorithms and validate and assimilate the results of the data mining runs. Virtually all of the operational complexity, time consumption and potential benefit of data mining lie in performing these steps and performing them well. Doing so requires a methodology.

Against the backdrop of these environmental observations, we would like to share what we believe are the four critical success factors for data mining projects. Our definition of success, as you will read, is somewhat soft. In any given situation, data mining may not arrive at "the answer" for a variety of reasons. However, this does not necessarily imply an unsuccessful data mining project.

Critical Success Factor #1: Have a clearly articulated business problem that needs to be solved and for which data mining is the proper solution technology.

Not every problem is worth solving. Problems that need solving have the attention of the organization because they are either a source of pain for the enterprise or they are seen as an opportunity. Problems that need solving garner executive sponsorship and have a value placed on their solution. Problems that need to be solved are characterized as actionable because the organization has the ability and commitment to effect the necessary changes as indicated by the solution to the problem.

Finally, data mining has plenty of cache, and it might be tempting to use it to solve every problem ­ but some problems are inappropriate for data mining. A question such as "Who are my best customers?" is best answered with a query or on-line analytical processing tool. A question such as "What are the characteristics of my best customers?" is a prime candidate for data mining.

Critical Success Factor #2: Insure that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity for data mining. Know the state your data is in. If you do not know, commission someone to assess it. Know the types of data that are required by the data mining algorithms you intend to use. For example, if you want to use data mining to build a model to detect credit card fraud, you need historical data where fraud was detected and the attributes and transactional history of the card holder were captured.

Enforcing the integrity required by transactional systems or the level of data hygiene required by a data warehouse does not insure that your data will be of the quality needed to apply data mining. Data cleanliness is driven by the demands of its consumers.

Critical Success Factor #3: Recognize that data mining is a process with many components and dependencies and manage them appropriately. The people doing the analysis will see the data mining process as open-ended and exploratory. The people from your business or IT staff will see a data mining project as any other and expect time lines, milestones and fixed deliverables. The truth is that the data mining process lies somewhere in between the two. The entire project cannot be "managed" in the traditional sense of the word. What you would like to say is, "So it will take two days to extract the data, then we give the data to the analyst and he analyzes it for five days and then gives us the predictive model." In reality, all you can count on after the fifth day is that the data has been "analyzed" for five days.

It is reasonable to expect that the IT portions of the project can be scoped based on prior experience and managed closely. As for the data mining analysis itself, the best approach is to break the task into conceptually manageable pieces of relatively short duration. If the objective is to predict customer attrition, the analysis approach may be to do some preliminary customer segmentations and then use a neural network to do the actual prediction. A time frame can be established to do the initial segmentation, and a checkpoint can be used to assess the progress at that point. The actions at that checkpoint may be to acquire more data, perform more data transformations or to proceed with the prediction.

Critical Success Factor #4: Plan to learn from the data mining process regardless of the outcome. The reality of data mining is that it is an immature discipline bordering on art more than science. Furthermore, there is no guarantee that any given data mining project will yield the answer to the question being pursued. Data mining is not magic and cannot produce magical results.

The situation is far from dismal, but accepting the possibility of failure (not obtaining the answer) is very important. A typical IT project has a stated outcome; and, in the end ­ with few exceptions ­ that outcome is produced. It may take longer to achieve and it may run a little slower or use more storage than expected; but, in the end, the users have the system or the report they requested. In the case of data mining, failure to get to the answer or insight that we set out to obtain could be due to a number of reasons. There may be insufficient or inadequate data. The underlying process being modeled may be random. The underlying process may be deterministic, yet too complex to model with existing data mining tools. The people doing the mining may lack adequate knowledge, experience or intuition. The target problem may be unsuitable for data mining.

Regardless of the outcome, the process of data mining invariably takes people deeper into their data than they have ever been before. The mining may have called for data that has been collected but never put to any use. Data that was previously deemed clean may show itself to have quality issues. Data relationships unrelated to the data mining project may reveal themselves. These discoveries can be insightful and valuable in their own right, and they are a side benefit of pursuing data mining in the first place. They may cause a review of data cleansing and validation procedures so that future mining efforts will be better supported. They might provide incremental insight into relationships between the organization and its customers or between products or services. They might trigger a wholly different path of analysis.

The promise of knowledge discovery technology is too great to ignore. Waiting for the industry to fully mature before determining if data mining is appropriate for your enterprise may mean that valuable opportunities for the organization go undiscovered. Worse yet, the data demands of this new technology may not be readily apparent and today's unarchived, missing or inaccurate data will not serve well as the basis for tomorrow's data mining. Adopting the critical success factors outlined in this article will improve the confidence that these early leaps into data mining technology can be made successfully and that they are inherently insightful.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access