A critical point to note is that data mining is a business process - a way of finding patterns in your data that provides insight you can use to conduct your business more effectively. Data mining also makes predictions to guide customer interactions and other business decisions. You'll see these points reinforced numerous times in the information that follows.
Myths and Misconceptions about Data Mining
Myth #1: Data mining is all about algorithms.
A business person attending a typical data mining conference or reading its proceedings might form the impression that data mining is all about advanced data analysis algorithms. This misconception might be summarized as follows: All you need for data mining is good algorithms. The better your algorithms, the better your data mining - advancing the effectiveness of data mining means advancing our knowledge of algorithms.
To hold this view is to misunderstand the data mining process. Data mining is a process consisting of many elements, such as formulating business goals, mapping business goals to data mining goals, acquiring, understanding and preprocessing the data, evaluating and presenting the results of analysis and deploying these results to achieve business benefits.
This is not to minimize the importance of new or improved data mining algorithms. The problem occurs when data miners focus too much on the algorithms and ignore the other 90-95 percent of the data mining process.
The consequences this misconception can be disastrous for a data mining project, possibly resulting in a failure to produce any useful results. Experienced data miners recognize the need for a broader view of the data mining process.
Myth #2: Data mining is all about predictive accuracy.
While data mining is not all about data analysis algorithms, there is a part of data mining that is about algorithms. This raises the question: How can you judge the quality of an algorithm?
You might think that the main criterion would be the predictive accuracy of the models it generates. This view, however, misrepresents the role of algorithms in the data mining process.
It is true that a predictive model should have some degree of accuracy, because this demonstrates that it has truly discovered patterns in the data. However, the usefulness of an algorithm or model is also determined by a number of other properties, one of which is whether understanding the resulting model requires deep technical knowledge or is something that can be understood by a typical analyst.
Data miners who believe that predictive accuracy is the primary criterion of algorithm evaluation might use algorithms that can only be used by technology experts. These algorithms will then play only the most limited role, because data mining is a process that is driven by business expertise; it relies on the input and involvement of non-technical business professionals in order to be successful.
Myth #3: Data mining requires a data warehouse.
Business people often think that a data warehouse is a prerequisite for data mining. This is a subtle misconception about the relationship between the two technologies.
It is true that data mining can benefit from warehoused data that is well organized, relatively clean and easy to access. This is particularly true if the warehouse has been constructed with data mining specifically in mind and with knowledge of the requirements of the data mining project. If this has not been the case, however, the warehoused data may be less useful for data mining than the source or operational data. In the worst case, warehoused data may be completely useless (for example, if only summary data is stored).
A more accurate depiction of the relationship between the two would be that data mining benefits from a properly designed data warehouse; and that constructing such a warehouse often benefits from first doing some exploratory data mining.
Myth #4: Data mining is all about vast quantities of data.
Early explanations of data mining often began with statements such as: We now collect more data than ever, yet how are we to benefit from these vast data stores? Focusing on the size of data stores provided a convenient introduction to the topic of data mining, but subtly misrepresented its nature.
While there are many large datasets that organizations can benefit from mining, it would be a mistake to believe that these should be the sole focus of data mining. Many useful data mining projects are performed on small or medium-sized data sets - some, for example, containing only a few hundreds or thousands of records.
Subscribing to the erroneous belief that data mining is only appropriate for vast data stores would lead organizations to choose tools that sacrifice usability for scalability when, in fact, both attributes are essential. To quote a customer of a leading data mining tool, "Other data mining tools optimize machine time, but this tool optimizes my time." Whether the datasets are large or small, organizations should choose a data mining tool that optimizes the user's time.
Myth #5: Data mining should be done by a technology expert.
Data mining uses advanced technology, and its workings, particularly those of modeling techniques, are unlikely to be understood by the wider IT community. Does this mean that only those who understand every nuance of the technology that is involved should conduct data mining?









