Collaborative filtering software uses the behavior of people with similar preferences to identify products an individual is likely to purchase. It is the best known type of "recommendation engine," which includes systems using many different techniques to decide what to offer in a given situation. Although the classic recommendation application involves product selections such as books or films, recommendation engines can also target advertising such as Web banners or select information such as replies to technical support questions. Alternatives to the group-based approach of collaborative filtering include market basket analysis, pattern analysis, knowledge base analysis as well as standard predictive modeling.

Recommendation engines are a relatively new application for most marketers. Although marketing has traditionally identified the most likely buyers for each product, it was usually up to sales to recommend specific products to specific individuals. The growth of interactive systems, both on the Internet and in telephone call centers, has changed this by creating a need for automated systems that make recommendations without expert human intervention. The potential for significant revenue increases with little or no added expense makes this an especially attractive investment, particularly since the incremental sales are so easily measured.

The most prominent feature of collaborative filtering is a high degree of automation. It is typically applied in situations where there are thousands of options available and preferences can shift quickly ­ conditions making it impractical to employ handcrafted rule sets or conventional statistical models. A collaborative filtering system automatically performs the critical tasks: recording the behavior of each individual, applying the behavior to predict future behavior and adjusting its predictions in response to results. Some other recommendation techniques can also achieve similar degrees of automation, although this is not always built into the standard software packages. Instead, these packages often limit themselves to discovering relationships among different behaviors and rely on a human to transform these into business rules.

The automation of recommendation engines is something of a mixed blessing: it lets the systems handle high volumes and behavior shifts, but also limits the output to the tactical question of what a customer is mostly likely to want. The response can be broadened somewhat to incorporate simple business needs ­ for example, adding a margin calculation would let the system recommend the offer with the highest expected profit rather than just the highest expected response rate. But deeper strategic considerations ­ such as whether to make a product offer at all in a given situation ­ are outside the recommendation engine's scope. As noted last month, these require an interaction management system, which may call on a recommendation engine as part of its larger decision-making process.

There are about a half-dozen collaborative filtering software packages on the market today. Although there are substantial technical differences among them, users need to consider a number of general issues when making a selection:

Data types: Some systems require users to provide explicit ratings of sample products. Other systems can look at behaviors and infer preferences without an explicit rating. Many systems can combine both types of data. Systems also differ in whether they are limited to purchase data or can consider other activities, such as viewing a Web page or putting an item in a shopping cart, how easily they incorporate data from non-Web legacy systems and whether they can consider non-behavioral data such as demographics or original source.

Outputs: The standard outputs of these systems are recommendations of products the customer is most likely to want. But some systems can also make cross-sell recommendations. Other recommendations may involve Web banner ads the viewer is mostly to click on or the documents most likely to answer a question. Each of these applications has somewhat different requirements, so it is important to ensure a particular recommendation engine is suited to your expected use.

Constraints: A simple product recommendation may involve no constraints other than eliminating items the customer has already purchased. Even this can be harder than it seems; for example, a system needs to exclude different editions of the same book. Cross-sell applications might involve more complicated constraints, such as ensuring products come from different categories (a tie matching a shirt, rather than a second shirt). Advertising and sales applications also often require limits on how many times the same offer can be made to an individual and how long to wait before showing the same offer again. Other constraints such as inventory level and profit margin might also come into play. Sophisti-cated marketers will want to throw in an occasional random selection to help validate system accuracy and to test customers for unknown interests.

Integration: Most of these systems are designed to feed recommendations to external touchpoint systems that will execute the interaction itself. Systems built with integration in mind will provide documented APIs, standard objects, multiple database connections and other tools. Even when a system is designed for integration, buyers must ensure the methods chosen by a particular vendor are compatible with whatever technology is deployed elsewhere in their organization. This is particularly true as companies seek to deploy recommendation engines at all touchpoints, not just the Web. Since the different touchpoint systems may themselves use different technologies, the ability to handle several different kinds of integration may be essential.

Scalability: This goes hand-in-hand with integration, since integrating with inadequate response time is pretty darn useless. Scalability involves many dimensions, including response time, transactions per minute, customers on file, data per customer, products rated, attributes per product and number of data sources. Systems vary greatly in their resource requirements: some store highly compressed profiles while others keep every piece of detail. Systems that store detail can be very powerful, but may need impractical amounts of disk space or processor cycles when dealing with large volumes. Some products scale well through parallel processing, in-memory databases and other high-end technologies; others have bottlenecks these won't address. As always, careful testing is needed to be sure any system will scale beyond the size of its existing installations.

Accuracy: It seems odd to put the accuracy of a system's recommendations so low on the list, but the practical requirements already described do come first. Of course, quality does matter, and there are significant differences among the various products. But different systems will have different strengths, so there is no real alternative to testing with your own data to find which system works best for you. It's important to test how well each system deals with varying amounts of data ­ you will probably face a range of situations from new customers and new products with no history, to established customers and products with vast amounts of detail. Since this is one of those areas where averages can be highly deceptive, it's important to measure separately how each system handles each situation.

Reporting: These systems provide a wide range of information, including operational statistics such as processing volumes, response times and resource consumption; confidence and probability values returned with individual recommendations; performance data on actual response rates; and analytical reports such as how often each product is recommended and purchased. A good system will display trends as well as snapshots of information, and maybe even issue automated alerts when there is a significant change in recommendation accuracy or demand for particular products. For systems that build models or rule bases that define relationships among behaviors or data elements, ways to visualize and summarize these are also important.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access