My daughter's girls' high school volleyball team recently climaxed a great season with a run in the state tournament that unfortunately ended in a championship game loss. The girls were disappointed, but will remember their trip “downstate” for a long time. Even with an excellent group of underclassmen returning, the team understands there's no guarantee next year's stars will align for an encore.

Over the course of  the two week tournament, a few thoughts occurred to me that are pertinent for measurement and business intelligence. One relates to the different categories of schools for competition based on size and other factors. The other has to to do with the structure of the tournament itself – who plays whom, when and where.

I like to think of a statewide tournament as an “experiment” to determine a champion for a given sport. A challenge for those who “design” this experiment is to minimize “bias” so the public can have confidence that tournament winners indeed represent the “best” at their crafts. Unfortunately, because a truly “randomized” tournament is both impractical and unattractive, officials must grapple with quasi-experimental “methods” and “designs” that attempt to control bias, all within a limited budget. Among the factors under quasi-experimental control are school size, team seedings and  brackets.

There are four different school size classifications for girls' volleyball in our state, each with its own tournament of “like” schools. Enrollment is the primary factor, assuring that schools with 2000+ students don't enjoy an unfair advantage competing against those with 200. For public institutions with students of both sexes and a defined catchment area, the size classifier works reasonably well. But our state athletic association makes accommodations for single-sex schools and private schools with no limitation on boundaries that are free to recruit promising athletes living miles away. To this end, the association computes adjusted enrollment figures, which for same-sex schools is simply double the actual enrollment, while for private schools is actual enrollment multiplied by an “adjustment factor”, currently set at 1.65.

The mixture of private and public schools in state tournaments is not without challenges, even with the multiplier adjustment to private school enrollment. Indeed, some states simply skirt the problem by holding separate public and private school tournaments. In our state, private schools win an inordinate number of competitions, in large part because they're so good at “selecting” student athletes. The athletic association counters by raising the multiplier, making smaller private schools compete against ever-larger publics.  Incidentally, the top two volleyball teams overall in one final newspaper poll this season were both mid-sized private schools, each of which had several college scholarship-level players. Not only did these “privates” rise to the top of their classification, they were rated higher than the best large schools as well. And so it is that selection problems, along with the accommodations  made to counter their influence, are central to high school sports tournaments, just as they are to BI investigations.

For the two largest school classifications of girls' volleyball, the state tournament is comprised of eight sectionals, each consisting of 16 or more teams advancing from four one-and-done regionals. Teams are bracketed or seeded at the sectional level, the sections themselves determined by geography. A number 2 seed in sectional A is not necessarily equivalent to a number 2 seed in sectional B, however, and a given sectional might hoard 4 or 5 of the top-rated teams in state, just by geographic happenstance.  The eight sectional winners go on to four super-sectionals, the winners advancing “downstate“ to a final four.

This setup contrasts to today's grandaddy of all national sports tournaments, March Madness -- NCAA Division 1 men's basketball. The 65 team NCAA tournament consists of four regional brackets, each with sub-regionals.  The regionals, though,  are now just nominally geography-based. The tournament committee looks to make all brackets “equal” in overall quality and ships teams around the country to that end. North Carolina and Duke, just 10 miles apart, might be ranked #'s 1 and 2 in the country, respectively. North Carolina could get a 1 seed in the East and Duke perhaps a #1 in the West. With the current NCAA format, a # 3 seed in the Mideast is comparable to a # 3 in the Midwest. The regions have only to do with where teams play, not where the schools are located.  #1's North Carolina and Duke could only play each other in the final four.

Contrast that scenario with one that occurred in the second largest school category of our state volleyball this year, where the teams rated 1 and 2 overall were geographically adjacent and matched in the finals of a sectional –  just the sweet 16. After what was considered by many the “real” championship game, the winner still had 3 matches remaining. The ultimate champion had the following points scored against them in the sequence of best of three sets matches of their playoff run: 14, 22, 31, 71, 26, 37, 43. Guess which game involved the consensus second best team.

A statistical analogy would be that in high school tournaments, the brackets are stratified or strata-based, whereas NCAA basketball brackets are cluster-based. In the stratified case, the grouping is natural geography, so schools are bracketed in the region they're located, with seedings local and non-comparable between regions. The strength of different brackets can thus vary considerably. For cluster bracketing on the other hand, the grouping might again be geography, but schools are assigned to regions to make them “equal” in overall team quality, with between-bracket seedings comparable. Where a school ends up competing has little to do with its location. Cluster-based brackets can guarantee that the overall highest-rated teams meet only in the late rounds; stratified cannot.

NCAA basketball tournament brackets weren't always cluster-based. In 1964, the first year of UCLA's incredible reign, the Bruins advanced to the final 4 from the West, beating San Francisco in the regional final after handling Seattle in the semis. UCLA  then nipped Kansas State in the national semi-finals before thumping Duke in the championship. At that time, the tournament was stratified on geography, with western teams playing in the West, east central teams in the Mideast, etc., and the regional brackets weren't necessarily “equal”. Indeed, some easterners argued that UCLA enjoyed a tailwind in its early championship years, coming out of a “weaker” West and playing the Midwest regional winner in the national semi-finals.

The cluster-based tournament seems a better choice than stratified for controlling bias, but at a cost. The NCAA can afford the travel expenses incurred to ship teams all over the country, while such a “design” is impractical for state high schools. In tandem with advances in seeding methodologies,  cluster-based brackets appear to be working well for NCAA basketball, where the goal is to give higher-rated teams an easier road to the finals. The last 3 final 4's have included 1 #3 seed, 3 #2's and 8 #1's.

Steve Miller also blogs at miller.openbi.com.