Continue in 2 seconds

Where can I find sources about failed data mining projects and the reason for their failure?

  • Sid Adelman, Chuck Kelley
  • April 02 2003, 1:00am EST


Where can I find sources about failed data mining projects and the reason for their failure? Is there a percentage figure of the data mining projects that failed?


Chuck Kelley’s Answer: There is a number (quite high) that is floating around. I won’t quote it because there is a not very good answer to how this number was determined. Failure is in the eye of the beholder. If you projected 1,000 percent increase in profits because of a data mining project and it only yielded 950 percent, that project could be defined as a failure. Please be leery of such numbers without an explanation behind it.

Sid Adelman’s Answer: Most folks don’t brag about their failures and the vendors sure won’t be coming forward to tell you about theirs. Most of the failure factors associated with data warehousing are also relevant to data mining. The following is an excerpt from Data Warehouse Project Management by Sid Adelman and Larissa Moss.

There is disagreement over the failure rate of data warehouse projects. Rather than contribute to the debate we will detail the types of situations that could be characterized as failures and leave it to the reader to decide if they truly constitute failure.

The Project is Over Budget

Depending on how much the actual expenditures exceeded the budget, the project may be considered a failure. The cause may have been an overly optimistic budget or the inexperience of those calculating the estimate. The inadequate budget might be the result of not wanting to tell management the bitter truth about the costs of a data warehouse.

Unanticipated and expensive consulting help may have been needed. Performance or capacity problems, more users, more queries or more complex queries may have required more hardware or extra effort to resolve the problems. The project scope may have been extended without a change in the budget. Extenuating circumstances such as delays caused by hardware problems, software problems, user unavailability, change in the business or other factors may have resulted in additional expenses.

Slipped Schedule

Most of the factors listed in the preceding section could also have contributed to the schedule not being met, but the major reason for a slipped schedule is the inexperience or optimism of those creating the project plan. In many cases management wanting to "put a stake in the ground" were the ones who set the schedule by choosing an arbitrary date for delivery in the hope of giving project managers something to shoot for. The schedule becomes a deadline without any real reason for a fixed delivery date. In those cases the schedule is usually established without input from those who know how long it takes to actually perform the data warehouse tasks. The deadline is usually set without the benefit of a project plan. Without a project plan that details the tasks, dependencies and resources, it is impossible to develop a realistic date by which the project should be completed.

Functions and Capabilities not Implemented

The project agreement specified certain functions and capabilities. These would have included what data to deliver, the quality of the data, the training given to the users, the number of users, the method of delivery e.g. web based, service level agreements (performance and availability), pre-defined queries, etc. If important functions and capabilities were not realized or were postponed to subsequent implementation phases, these would be indications of failure

Unhappy Users

If the users are unhappy, the project should be considered a failure. Unhappiness is often the result of unrealistic expectations. Users were expecting far more than they got. They may have been promised too much or there may have been a breakdown in communication between IT and the user. IT may not have known enough to correct the users’ false expectations, or may have been afraid to tell them the truth. We often observe situations where the user says jump, and IT is told to say "how high?" Also, the users may have believed the vendors’ promises for grand capabilities and grossly optimistic schedules.

Furthermore, users may be unhappy about the cleanliness of their data, response time, availability, usability of the system, anticipated function and capability, or the quality and availability of support and training.

Unacceptable Performance

Unacceptable performance has often been the reason that data warehouse projects are cancelled. Data warehouse performance should be explored for both the query response time and the extract/transform/load time.

Any characterization of good query response time is relative to what is realistic and whether it is acceptable to the user. If the user was expecting sub second response time for queries that join two multi-million-row tables, the expectation would cause the user to say that performance was unacceptable. In this example, good performance should have been measured in minutes, not fractions of a second. The user needs to understand what to expect. Even though the data warehouse may require executing millions of instructions and may require accessing millions of rows of data, there are limits to what the user should be expected to tolerate. We have seen queries where response time is measured in days. Except for a few exceptions, this is clearly unacceptable.

As data warehouses get larger, the extract/transform/load (ETL) process will take longer, sometimes as long as days. This will impact the availability of the data warehouse to the users. Database design, architecture, and hardware configuration, database tuning and the ETL code – whether an ETL product or hand written code – will significantly impact ETL performance. As the ETL process time increases, all of the factors have to be evaluated and adjusted. In some cases the service level agreement for availability will also have to be adjusted. Without such adjustments, the ETL processes may not complete on time, and the project would be considered a failure.

Poor Availability

Availability is both scheduled availability (the days per week and the number of hours per day) as well as the percentage of time the system is accessible during scheduled hours. Availability failure is usually the result of the data warehouse being treated as a second-class system. Operational systems usually demand availability service level agreements. The performance evaluations and bonus plans of those IT members who work in operations and in systems often depends on reaching high availability percentages. If the same standards are not applied to the data warehouse, problems will go unnoticed and response to problems will be casual, untimely and ineffective.

Inability to Expand

If a robust architecture and design is not part of the data warehouse implementation, any significant increase in the number of users or increase in the number of queries or complexity of queries may exceed the capabilities of the system. If the data warehouse is successful, there will also be a demand for more data, for more detailed data and, perhaps, a demand for more historical data to perform extended trend analysis, e.g., five years of monthly data.

Poor Quality Data/Reports

If the data is not clean and accurate, the queries and reports will be wrong, In which case users will either make the wrong decisions or, if they recognize that the data is wrong, will mistrust the reports and not act on them. Users may spend significant time validating the report figures, which in turn will impact their productivity. This impact on productivity puts the value of the data warehouse in question.

Too Complicated for Users

Some tools are too difficult for the target audience. Just because IT is comfortable with a tool and its interfaces, it does not follow that all the users will be as enthusiastic. If the tool is too complicated, the users will find ways to avoid it, including asking other people in their department or asking IT to run a report for them. This nullifies one of the primary benefits of a data warehouse, to empower the users to develop their own queries and reports.

Project not Cost Justified

Every organization should cost justify their data warehouse projects. Justification includes an evaluation of both the costs and the benefits as described in Chapter 6, Cost Benefit. When the benefits were actually measured after implementation, they may have turned out to be much lower than expected, or the benefits came much later than anticipated. The actual costs may have been much higher than the estimated costs. In fact, the costs may have exceeded both the tangible and intangible benefits.

Management does not Recognize the Benefits

In many cases, organizations do not measure the benefits of the data warehouse or do not properly report those benefits to management. Project managers, and IT as a whole, are often shy in boasting about their accomplishments. Sometimes they may not know how to report on their progress or on the impact the data warehouse is having on the organization. The project managers may believe that everyone in the organization will automatically know how wonderfully IT performed, and that everyone will recognize the data warehouse for the success that it is. They are wrong. In most cases, if management is not properly briefed on the data warehouse, they will not recognize its benefits and will be reluctant to continue funding something they do not appreciate.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access