The other day, I came across an article in the Wall Street Journal
noting that movie rental company Netflix
had announced a winner of the $1 million contest to improve the accuracy of its film recommendation engine. According to Netflix, The Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. The job of the current Netflix recommendation engine, Cinematch, is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customers unique tastes. The contest challenge was for programmers and analytics experts to take a lot of anonymous rating data and produce code to make recommendation predictions that would exceed an accuracy bar that is 10% better than what Cinematch can do on the same training data set. where accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings. Two programming entrants, Bellkor and Ensemble, exceeded the 10% improvement threshold, with Bellkor declared the winner for a more timely submission. Netflix, of course, will benefit by incorporating the winning code into new releases of Cinematch, leading to additional sales. Savvy Netflix CEO Reed Hastings no doubt understands the business potential of open sourcing.
A 2008 New York Times article, "If You Like This, You're Sure to Love That,
" provides insight into the analytics and programming challenges of the Netflix contest. Recommendation engines are generally examples of unsupervised learning, or learning without a teacher, in contrast to supervised learning, where a student presents an answer that is ultimately predicted. An important flavor of recommendation algorithm is the collaborative filtering we've grown to know and love from Amazon and Netflix: If you like this movie/book, you'll certainly like this other movie/book as well. A major breakthrough in the Netflix competition came when a contestant demonstrated a significant recommendation performance lift by exploiting the arcane linear algebra mathematics of singular value decomposition. Statisticians and psychologists are familiar with the SVD in their work with principal components, a technique often used to reduce complexity in multivariate data sets like a matrix of individual ratings of films. Though Netflix's contest data set is quite large, SVD mathematics is tractable: given well-behaved data and enough CPU horsepower, computational solutions are available.
A curious sort, I generally like to see more than just a high-level description of a statistical or machine learning procedure. While I don't need to understand all the gritty details of an algorithm with accompanying optimization mathematics, I do wish to glimpse into the black boxes now and then by examining toy examples that illustrate the computations. My web meanderings for such information on collaborative filtering led me to the Igvita
blogging site of Ilya Grigorik, computer scientist and self-proclaimed tinkerer, who's written several excellent articles on machine learning, in addition to other CS topics. A Ruby agile language aficionado, Grigorik details a simple but complete example of the singular value decomposition method, starting with a matrix that would, for example, hold the values for each of N raters on p movies, where N>p. Using a trivial (but nonetheless didactic) data set, Grigorik provides code from open source Ruby, using a freely-available package supporting complex matrix computations, to work through an example. Post computations, he illustrates the geometry of similarity that's a tenet of the collaborative filtering methodology. Once similar raters are found, recommendations to prospects can be made accordingly. The black box is opened.
All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:
- Full access to information-management.com including all searchable archived content
- Exclusive E-Newsletters delivering the latest headlines to your inbox
- Access to White Papers, Web Seminars, and Blog Discussions
- Discounts to upcoming conferences & events
- Uninterrupted access to all sponsored content, and MORE!