SEP 28, 2009 5:44am ET

Related Links

Innovative Organizations Likely to have More Pervasive BI and Data Governance
September 2, 2014
Revolutionize Your Business Intelligence with Lean, High-Performance Solutions
August 21, 2014
Should You Always Obey Orders from Your Executives?
August 7, 2014

Web Seminars

Why Data Virtualization Can Save the Data Warehouse
Available On Demand
Essential Guide to Using Data Virtualization for Big Data Analytics
September 24, 2014

The Netflix Prize


The other day, I came across an article in the Wall Street Journal noting that movie rental company Netflix had announced a winner of the $1 million contest to improve the accuracy of its film recommendation engine. According to Netflix, “The Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.” The job of the current Netflix recommendation engine, Cinematch, is to “predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer’s unique tastes.“  The contest challenge was for programmers and analytics experts to take  “ a lot of anonymous rating data” and produce code to make recommendation predictions that would exceed  an “accuracy bar that is 10% better than what Cinematch can do on the same training data set.” where “accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.“  Two programming entrants, Bellkor and Ensemble, exceeded the 10% improvement threshold, with Bellkor declared the winner for a more timely submission. Netflix, of course, will benefit by incorporating the winning code into new releases of Cinematch, leading to additional sales. Savvy Netflix CEO Reed Hastings no doubt understands the business potential of open sourcing.

A 2008 New York Times article, "If You Like This, You're Sure to Love That," provides insight into the analytics and programming challenges of the Netflix contest. Recommendation engines are generally  examples of unsupervised learning, or learning without a teacher, in contrast to supervised learning, where a “student” presents an “answer” that is ultimately predicted. An important flavor of recommendation algorithm is the collaborative filtering we've grown to know and love from Amazon and Netflix: “If you like this movie/book, you'll certainly like this other movie/book as well.” A major breakthrough in the Netflix competition came when a contestant demonstrated a significant recommendation performance lift by exploiting the arcane linear algebra mathematics of singular value decomposition. Statisticians and psychologists are familiar with the SVD in their work with principal components, a technique often used to reduce complexity in multivariate data sets – like a  matrix of individual ratings of films. Though Netflix's contest data set is quite large, SVD mathematics is tractable: given well-behaved data and enough CPU horsepower, computational solutions are available.

A curious sort, I generally like to see more than just a high-level description of a statistical or machine learning procedure. While I don't need to understand all the gritty details of an algorithm with accompanying optimization mathematics, I do wish to glimpse into the black boxes now and then by examining toy examples that illustrate the computations. My web meanderings for such information on collaborative filtering led me to the Igvita blogging site of Ilya Grigorik, computer scientist and self-proclaimed tinkerer, who's written several excellent articles on machine learning, in addition to other CS topics. A Ruby agile language aficionado, Grigorik details a simple but complete example of the singular value decomposition method, starting with a matrix that would, for example, hold the values for each of N raters on p movies, where N>p. Using a trivial (but nonetheless didactic) data set, Grigorik provides code from open source Ruby, using a freely-available package supporting complex matrix computations, to work through an example. Post computations, he illustrates the geometry of “similarity” that's a tenet of the collaborative filtering methodology. Once similar raters are found, recommendations to prospects can be made accordingly. The black box is opened.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Filed under:


Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
Please note you must now log in with your email address and password.