The other day, I came across an article in the Wall Street Journal noting that movie rental company Netflix had announced a winner of the $1 million contest to improve the accuracy of its film recommendation engine. According to Netflix, “The Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.” The job of the current Netflix recommendation engine, Cinematch, is to “predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer’s unique tastes.“  The contest challenge was for programmers and analytics experts to take  “ a lot of anonymous rating data” and produce code to make recommendation predictions that would exceed  an “accuracy bar that is 10% better than what Cinematch can do on the same training data set.” where “accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.“  Two programming entrants, Bellkor and Ensemble, exceeded the 10% improvement threshold, with Bellkor declared the winner for a more timely submission. Netflix, of course, will benefit by incorporating the winning code into new releases of Cinematch, leading to additional sales. Savvy Netflix CEO Reed Hastings no doubt understands the business potential of open sourcing.
A 2008 New York Times article, "If You Like This, You're Sure to Love That," provides insight into the analytics and programming challenges of the Netflix contest. Recommendation engines are generally  examples of unsupervised learning, or learning without a teacher, in contrast to supervised learning, where a “student” presents an “answer” that is ultimately predicted. An important flavor of recommendation algorithm is the collaborative filtering we've grown to know and love from Amazon and Netflix: “If you like this movie/book, you'll certainly like this other movie/book as well.” A major breakthrough in the Netflix competition came when a contestant demonstrated a significant recommendation performance lift by exploiting the arcane linear algebra mathematics of singular value decomposition. Statisticians and psychologists are familiar with the SVD in their work with principal components, a technique often used to reduce complexity in multivariate data sets – like a  matrix of individual ratings of films. Though Netflix's contest data set is quite large, SVD mathematics is tractable: given well-behaved data and enough CPU horsepower, computational solutions are available.
A curious sort, I generally like to see more than just a high-level description of a statistical or machine learning procedure. While I don't need to understand all the gritty details of an algorithm with accompanying optimization mathematics, I do wish to glimpse into the black boxes now and then by examining toy examples that illustrate the computations. My web meanderings for such information on collaborative filtering led me to the Igvita blogging site of Ilya Grigorik, computer scientist and self-proclaimed tinkerer, who's written several excellent articles on machine learning, in addition to other CS topics. A Ruby agile language aficionado, Grigorik details a simple but complete example of the singular value decomposition method, starting with a matrix that would, for example, hold the values for each of N raters on p movies, where N>p. Using a trivial (but nonetheless didactic) data set, Grigorik provides code from open source Ruby, using a freely-available package supporting complex matrix computations, to work through an example. Post computations, he illustrates the geometry of “similarity” that's a tenet of the collaborative filtering methodology. Once similar raters are found, recommendations to prospects can be made accordingly. The black box is opened.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access