This is an article from the July 2006 issue of DM Review's Extended Edition. Click on this link for more information on DMR Extended Edition or to download this entire issue in a PDF format.While data mining has been around for decades, companies still struggle with the challenge of applying the results of data mining in real time in an operational environment. Real-time operational business intelligence (BI) offers improvements on existing solutions and can be found in a variety of applications.

The results of data mining are used every day to detect problems and uncover opportunities. To be truly useful, though, those results should be quickly and easily incorporated back into the organization's processes and systems. An effective BI solution should be able to mine both transactional and historical data in real time and integrate the knowledge gained in an operational environment.

Most data mining solutions focus on wading through mountains of stored data to get to a nugget of knowledge. A more difficult task is to integrate real-time transactional data with stored historical data. Most businesses have data stored in multiple databases in multiple locations. Often, the databases have different formats and platforms. This causes many companies to resort to creating a data warehouse to mine the data. Because of the nature of data warehousing, it is not possible to incorporate transactional data in real time.

To operate on transactional and historical data in real time, technology must be able to search databases remotely because of the batching process of a data warehouse. The technology must be able to work with different types, platforms and formats of data because data cleansing would also slow the process. The speed of the search must be extremely fast.

For the best results, the analysis should incorporate multiple analytical tools. Rules engines or predictive analytics are appropriate for different problems. A system that could apply multiple analytical tools, take the respective results and aggregate them to create a comprehensive analysis would be ideal.

Once the ability to operate on data in real time exists, it is still necessary to automatically apply the results of the analysis in real time into the operational system as well. The more time it takes to incorporate the intelligence captured by the analytics, the more opportunities are lost and the greater the impact on business operations.

Traditional data mining solutions focus only on the back-end, offline analysis and require that the captured models or patterns be put into the organization's operational processes and systems manually. A better solution would be for these patterns and models to be automatically integrated in an operational environment. When a new pattern is discovered, the system automatically sends out an authorization notification to the assigned business decision-maker. This can be done through electronic alert, email, pager or phone, depending on the preference of the decision-maker. Once the change is authorized, it is automatically applied to the operational system. BI results can thereby be processed and changes implemented in a matter of seconds rather than days.

Traditionally, these types of systems have been complicated by custom builds. They required extensive operational knowledge and many months to develop. Now, companies are moving to create configurable systems that can be easily adapted to multiple industries and applications, making implementation much quicker. These next-generation operational BI systems will allow a complete implementation of a fully applied BI solution in a few weeks instead of months or years.

One area where this type of technology could have a significant impact is retail shrinkage. According to the 2004 National Retail Security Report from the University of Florida, shrinkage - the loss in inventory due to various types of fraud - is responsible for the loss of 1.54 percent of annual sales. For the retail industry, this equates to a loss of $30 to $50 billion. Currently, retailers primarily use two methods of determining risk of fraud in returns processing - exception-based reporting and returns management solutions.

Exception-based reporting applies data mining to sales and return data to find anomalies or exceptions to normal activity. One limitation with exception-based reporting is that it doesn't catch the anomalies until after the transaction has been processed. Additionally, the software only looks at the sales and returns data. It doesn't look at data in other data sources for patterns in relationships between employees, customers, vendors and places. For example, it can't determine that the customer making a return actually has the same home address as the cashier that is processing the return.

Returns management solutions provide real-time information but only within a narrow focus. Returns management solutions look at the number and type of returns a person makes and apply policies at the point of return to tell the cashier whether or not to accept the return. While this provides real-time information, the information is limited in scope and relevancy. If a retailer bases return acceptance or denial on the number and type of return, it has the potential to punish a good customer while rewarding a criminal.

For example, consider Sally and Susan. Sally is a very good customer of RetailMart. She buys lots of items from the store for herself and her family. RetailMart uses returns management software that applies a policy limit of five returns per month. Because she shops for other people, she often needs to bring items back. She often goes above five returns per month and then is not allowed to return other items. She constantly needs to talk to management to get her returns processed, and she ends up frustrated with the level of service she receives. Eventually, Sally becomes a very good customer of ShopMart instead.

Susan is another RetailMart customer who never gets her returns denied because she never does more than five returns per month, but that is because her roommate used to work as a cashier at RetailMart and knows that five is the magic number. Now her roommate works in shipping and receiving at RetailMart, where items often "go missing." Who is the real risk here? Returns management solutions treat Sally as the risk while Susan sails through.

A leading retailer recently completed a pilot of a new retail fraud detection solution that incorporates some of the elements of operational BI discussed here. By using a software solution that has the ability to access multiple data sources without need for a data warehouse and can incorporate multiple analytics, the retailer was able to uncover $15.8 million in previously undetected fraud risk in their data, including:

  • $11 million in nonreceipted refunds by customers returning between 60 and 120 times in a year;
  • $3.4 million in nonreceipted refunds by known shoplifters; and
  • $1.4 million in nonreceipted refunds by employees.

Important information related to fraud risk resided in multiple databases that the retailer had access to, including separate databases dedicated to customer, employee, vendor, refunds, products and SKUs, shoplifting and bad check data. The exception-based reporting tools were not able to search and analyze data across these disparate sources. Even when trained analysts apply these tools across limited data sets, the software was not able to score the results by relative levels of risk. Instead, field investigators had to sort through voluminous written reports of potential fraudulent activity to manually identify worthwhile leads. The result was an excessive amount of both false positives (leads that turn out not to be fraud) and false negatives (actual fraud that is missed).
During the pilot, the retailer utilized a software solution that applied a comprehensive approach to detecting retail fraud by:

  • Using sophisticated algorithms and data mining analytics to find hidden relationships that exist between people, places and transactions;
  • Uncovering ways in which people try to hide their identity; and
  • Combining those results with analysis of sales and returns data.

The pilot utilized approximately 89,000 employee records, 2.3 million customers with nonreceipted refunds (sampled nine months nonreceipted refunds) and 68,000 vendor records. These records were analyzed against each other and compared to incident data (e.g., shoplifting records). Close to 1 trillion record comparisons were made, translating into more than 7 trillion individual field or attribute level comparisons. The pilot utilized four PC-based machines with Xeon 2.4 GHz processors, 4GB RAM and RAID-5 drives.
Operational BI software could also be applied to address the problems of insurance fraud, identity theft, credit card fraud and national security threats, which are all problems that have a large monetary and societal impact.

By applying real-time BI results to how organizations work, true operational BI is created. Organizations are beginning to demand this level of speed in their operations. The technology to achieve this now exists, and we will continue to see BI companies moving in this direction.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access