My daughter and her boyfriend are both Division I athletes at Wake Forest University in North Carolina. Megan's a volleyball player while Andrew competes in soccer.
A few weeks ago, my family had the pleasure to visit Winston-Salem, where we took in a volleyball game Friday evening and a soccer match on Saturday. Both Wake teams won, but the soccer victory was especially noteworthy, coming over Atlantic Coast Conference (ACC ) rival Syracuse.
At dinner after the game, we were discussing prospects for the ranked Demon Deacons and I asked Andrew how good the ACC was this year. He said pretty good.
Data geek that I am, I just had to get some stats to elucidate the soccer prowess of Wake and the ACC. So I surfed the NCAA DI soccer website and downloaded the RPI statistics as of September 28, 2015.
Without going into detail, the RPI is a performance proxy that “is a measure of strength of schedule and how a team does against that schedule.” So better teams generally have a higher RPI score and a lower RPI rank. Also, whereas coaches or media polls might rank just the top 25 or so teams, the RPI, a statistical construct, is available for all participants. Having rankings for everyone is obviously critical for contrasting conference performance. And the RPI's ubiquitous in college sports – assigned to each team in most D1 programs.
I used the popular R statistical platform to do the analysis with the scraped data, invoking the data.table and ggplot2 packages to drive the computations and final graphic. The visual is a stripplot, which displays the individual RPI rankings by conference, the points “jittered” so that all are clearly visible. Superimposed on each conference's raw data are segments and a “notch” that indicate the conference quantile rankings – top (0th%) ranking, the 25th%, the median or 50th%, the 75th%, and the laggard or 100th% teams. The conferences are sorted on performance from left to right by adding together the 0th, 25th, 50th and 75th percentile rankings, lower scores superior.
It turns out that Andrew's assessment of the ACC's soccer quality was understated, to say the least. Indeed, at first I did a double take, certain my computations had to be in error. They weren't.
Take a look at this graphic. The leftmost Atlantic Coast Conference stands alone in performance, off the charts, actually. With one third of the season completed, the 100th% or “worst” RPI ranking in the ACC was superior to the 50th% or median ranking for every other conference. And, the top four RPI teams were all from the ACC. I've never seen this level of separation with other conference RPI rankings over the years – not even the SEC in football.
From an analytics perspective, my assessment of the visual is that it's quite informative, even if busy and not very attractive – my daughter snarked that the graph appeared to have acne. But I like graphs that depict both raw data and statistical summaries, and that also use left to right and top to bottom sorting to tell their story.
I'm a big fan of both ggplot2 and lattice for statistical plotting in R. Yet, I also love Tableau for easy exploration, interactive access and visual elegance. Each has an important role in the data analysis tool chest.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access