I’d just started my annual review of free online courseware for relevant computer science and statistics curricula when a colleague referred me to a NYTimes.com article on a new company formed by Stanford CS faculty colleagues Andrew Ng and Daphne Koller.

Two and a half years ago, I mentioned a free Stanford machine learning course taught by Ng in an OpenCourseWare update blog. I was very impressed with the curricula, noting at the time: “All the pieces – lectures, notes, handouts, review documents, assignments and tests – are in place for enterprising ‘open’ BI candidates to learn the same ML material as advanced Stanford students, without leaving their notebook computers.”

It seems Ng’s remained very busy. Last year, he and Koller taught free, Web-based courses through Stanford to more than 100,000 students. And an AI course led by two other instructors attracted 160,000 participants from 190 countries at the outset. These curricula are distributed for free under a Creative Commons license, which seems a lot like open source licensing that’s near and dear to our hearts. 

Now Ng and Koller have started a new venture, Coursera, “as a Web portal to distribute a broad array of interactive courses in the humanities, social sciences, physical sciences and engineering.” Financial backers are unfazed by the free courseware products. One VC opined: “From a community of millions of learners some should ‘opt in’ for valuable, premium services. Those revenues should fund investment in tools, technology and royalties to faculty and universities.”

Coursera breaks from academia by making its learning components more byte-sized than the traditional 50 or 75 minutes university lectures. The system divides “lectures” into segments as short as 10 minutes and offers quick online quizzes as part of most. A feature that allows students to get support from the student community appears to work well: a dry run found that questions were typically answered within 22 minutes.

The Coursera approach appears similar to that of the “flipped classroom,” pioneered by the highly-successful Khan Academy, in which students assimilate lectures at home and then work on problem-solving or “homework” in the classroom, either one-on-one with the teacher or in small groups.

A Coursera-style machine learning course taught by Dr. Ng debuted Monday. I’ve already gone through the first few weeks of pre-recorded video lectures and much like what I’ve seen. Ng’s an outstanding teacher and the short lectures are perfect for an attention-challenged learner like me. The periodic quizzes keep me on my toes. And I love the approach of using an open source, high-level, mathematical language like Octave – a topic of several lectures -- to program the learning algorithms for class assignments. I find if I can program the model in Octave, I have a pretty good understanding of its operation.

Not to be outdone by its academic peer, the esteemed California Institute of Technology is currently delivering a free, online machine learning course of its own. Learning From Data is unabashedly promoted as “A Real CalTech course, not a watered-down version”.

LFD consists of 18 lectures given by award-winning CalTech CS professor, Yaser S. Abu-Mostafa. Of the six already delivered and available on the Web, I’ve seen four and have been most impressed with both the organization and Abu-Mostafa’s communications skills. The material revolves on blocks of conceptual, mathematical and practical lectures, of which those from the first month are primarily the former. Much as I’ve enjoyed the lectures so far, I do look forward to the practical material starting next week.

The recommended text for LFD, also called “Learning From Data,” is an inexpensive book written by the author and two collaborators. Since the complete course lectures won’t be available online until the end of May, I purchased and reviewed the book, learning a lot from its 190 pages. Indeed, the wisdom derived from the Three Learning Principles articulated in Chapter 5 – 1) the continued relevance of Occam’s Razor, 2) the perils of Sampling Bias, and 3) the pernicious impact of Data Snooping – is alone worth the price of the book.

I’d encourage nascent data scientists to take a look at both of these free courses. Machine learning is certainly front and center to the DS discipline, and I love the way these curricula combine the best of theoretical and practical treatments. That both of these rigorous courses are taught by world-class experts and are available for free makes investigation a no-brainer.