I recently attended The Data Warehousing Institute (TDWI) spring conference in Boston. It was a great trip. Oh, the meeting was fine - I learned some new tricks and met some old friends. An unexpected highlight for me, however, was the fantastic weather and a city alive with spring - three straight days of 75-80 degrees and bright sun in Boston the middle of May. With the beautiful weather and some time on my hands, I was able to take advantage of Boston's well-deserved reputation as a great walking city. Several times, I made the trip up Massachusetts Avenue across the Harvard Bridge to Cambridge - to the heart of the MIT campus. With joggers, bicyclists and students everywhere, I almost (but not quite) wished I were back in school.
MIT, of course, is among the leading universities in the world, a nonpareil science, engineering and research institution. Flush with excitement from spring and a vibrant college community, I used some of my extra time to browse the MIT Web site (http://mit.edu/), intent on becoming acquainted with some of MIT's latest scientific and technological research. Instead, I discovered OpenCourseWare from the MIT home page and have been figuratively back in school - without the pressure of grades - ever since.
OpenCourseWare (OCW) is described on the MIT Web site (http://ocw.mit.edu/index.html) as "a free and open educational resource for educators, students and self-learners around the world." Long a noteworthy open source leader, innovator and contributor, MIT has raised the open stakes by publishing MIT course materials, making them generally available to the public. And though other U.S. universities such as Harvard, Johns Hopkins and Notre Dame belong to the OCW Consortium, none has made the comprehensive commitment of MIT.
At its core, MIT OCW is an electronic publishing model for educational materials enabled by Internet technologies. The idea behind MIT OCW is to make courses used in most undergraduate and graduate subjects taught at MIT available on the Web, free of charge, to any user, anywhere. There is no registration, no degrees or certificates and no access to MIT faculty. At this point there are more than 1,550 courses available in the MIT OCW, spanning all university departments.
MIT OCW courses are organized by department and by undergraduate/graduate status. Materials available for specific courses vary, but typically include syllabi, reading assignments, written assignments, study materials and exams. The more comprehensive include lecture notes, generally in the form of slide decks organized in PDFs. For stats/analytics courses, data sets are often available. Components may be downloaded individually, while the entire "course" is available as a zip file.
OpenCourseWare for BI
Many of the courses highlighted in the MIT OCW offer significant value for BI analysts. Not surprisingly, courses offered under the Sloan School of Management are particularly relevant. A few are noted here, but many more pertaining to technology, strategy, systems and process optimization are available to support disparate BI uses. My experience is that those with lecture notes offer the most immediate value. What follows is a small sampling of courses that roused my interest.
Communicating with Data (http://ocw.mit.edu/OcwWeb/Sloan-School-of-Management/15-063Communicating-With-DataSummer2003/CourseHome/index.htm) is a conceptual graduate-level survey concerned with "quantitative techniques as a way of thinking, not just a way of calculating, in order to enhance decision-making skills." (See Figure 1.) The point of departure for this course is decision analysis, which uses decision trees surrounding GOOP (goals, options, outcomes, probabilities) as a framework for structuring decisions. Unlike a pure stats course, Communicating with Data touches many quantitative and behavioral decision disciplines including probability, portfolio and risk management, utility theory, simulation and Monte Carlo analysis, and individual value - and thus serves both as a broad introduction to BI and analytic thinking and as a foundation for more advanced analytics work.
Applied Statistics (http://ocw.mit.edu/OcwWeb/Sloan-School-of-Management/15-075Applied-StatisticsSpring2003/CourseHome/index.htm) is an elementary/intermediate statistics course for undergraduates (see Figure 2). The lecture notes for this course are nothing short of fabulous. BI analysts can learn much from just the first two lectures on collecting data and summarizing/exploring data. Indeed, the summarizing/exploring data notes should be required reading for aspiring BI practitioners. This lecture pays homage to John Tukey and the Exploratory Data Analysis (EDA) movement in statistics that emphasizes data distribution and visualization in contrast to "the mathematization of statistics." Basic summarization graphs, such as stem and leaf, box and whiskers, scatter plot matrices and quantile plots, are highlighted as simple but invaluable aids to understanding data. All statistical/graphical illustrations as well as homework assignments are completed using the R/S-Plus family of statistical packages.
Data Mining (http://ocw.mit.edu/OcwWeb/Sloan-School-of-Management/15-062Data-MiningSpring2003/CourseHome/index.htm) is a personal favorite. The lecture notes, assignments and data sets for this undergraduate/graduate course are all quite valuable, even for those who opt not to use the Excel add-in recommended as the mining tool of choice for assignments. The focus is on applications of data mining technology to solving business problems rather than underlying mathematical and computer science theory. With data mining defined as "statistics at scale, speed and simplicity" lecture notes cover all approaches, including statistics, machine learning, database retrieval and hybrid. I particularly like the discussions of classification trees, regression trees, logistic regression, principal components, discriminant analysis, neural nets and k-means clustering - all techniques that I've used in practice. The business applications discussed with the methods bring the techniques to life. The lecture on association rules (market basket analysis) references work of Jiawei Han at the University of Illinois. (The OpenBI Forum will present a series of columns on data mining with Professor Han later in the summer.) Finally, the assignments and accompanying data sets are very valuable for investigating different mining approaches. I was readily able to access the data sets and complete the assignments with R and S-Plus. My one quibble with this course is the absence of notes for the last two lectures on collaborative filtering and the practice of data mining, which I'm sure are quite pertinent for practitioners.