Interleague Play and Data Analysis
Major League Baseball just concluded its 15th season of interleague play (ILP), where teams from the American League play teams from the National in six crossover series each year. If game attendance is any guide, ILP provides fans with an annual treat for several weeks every May and June.
Most baseball fans have allegiance to either the AL or the NL, so crossover series performance provides bragging rights to the league with most annual victories. And for the last seven years, it's been the American League that's been able to crow, putting up pretty dominating numbers against the senior NL. There's no shortage of explanations for the AL's success, the advantage to the AL of using their designated hitter for home games the most prominent offered by NL fans. AL aficionados counter that their league is now simply superior.
The interleague prowess of the AL's Chicago White Sox has received special attention this year. Before losing to the NL's Washington Nationals a few weeks back, the Sox had won 17 consecutive series against NL teams over a three plus season span, meaning they'd won at least 2 of the 3 games in each of their series against NL teams 17 straight times.
The Sox IL performance was grist for baseball's game announcers and analysts who proposed many explanations for their success. Mercurial manager Ozzie Guillen lamented not being able to play even more games against inferior NL teams.
Ever the stats geek, my first question was: could the White Sox streak plausibly be nothing more than luck – a random blip over the 15 years of play? So I set out to “test” by the numbers whether a reasonable analyst could conclude without a doubt that the Sox streak was indeed skill – that is, whether a 17 series winning streak by any team over that period was an aberration from chance.
My methodology vehicle was a computer simulation written in R and run 10,000 times. The “null” hypothesis was that all teams in the AL and NL are equal, so each team had a 50-50 chance to win any game – and hence a 50-50 chance to win each 3 game series. The “approximation” to history was that on average 28 teams played 6 crossover 3-game series per year, a total of 90 for the fifteen year duration. With the winner of each such game a coin toss, I ultimately calculated for each team a sequence of 90 wins (W) and losses (L) representing their IL series performance over the 15 years. With 28 teams and 10,000 simulation repetitions, I thus observed a total of 280,000 runs of 90 W's and L's.
Once the data were generated, I then looked for streaks of consecutive W's and L's, starting with 5 and progressing to 25. A streak of 5 means five or more consecutive series wins (or losses); a streak of 17 means seventeen or more straight wins/losses. Table 1 below details the results.
With the generated data, only 76 of the 280,000 trials, 0.03 percent, yield a winning streak of 17 games or more. Similarly, 79, 0.04 percent, represent 17 or more consecutive losses. Using this methodology, we must therefore reject the hypotheses that a 17 series winning streak by any team could reasonably be due to chance alone. Perhaps the experts and Ozzie Guillen are on to something: the Sox do very well against NL teams. My take? It helps that they play the woeful Cubs in two of their six annual IL series!
Even as the hypothesis that the Sox have no special skill must be rejected, the data do show that seeming patterns can emerge from pure randomness. Over 75 percent of the teams have a winning streak of 5 or more consecutive series from the 90 possibilities. And 50 percent would win 6 series in a row over the fifteen years just by chance. Indeed, it would take a 10 series streak for us to get truly suspicious and conclude that we're looking at something other than luck.
The lessons here are similar to the takeaways from last week's drunkard's walk blog. Analysts must be attentive to over-interpreting their data. What looks like signal is often simply randomness that resembles a pattern over the short run. In our rush to find profitable early signals in our data, we must be wary of attributing patterns to sequences that are really just noise. Finally, taking the time to program quick simulations to test hypotheses of specific patterns versus randomness can provide invaluable insights into analytic observations.