Last Friday, the NCAA held a summit of statistical minds, and it's a meeting that I think will change the men's basketball tournament as we know it forever. Why? The NCAA's selection process has slowly but surely changed over the years, but not to the point where it's made a grand alteration in the data it uses to choose and seed teams.
For most of the past 35 years, the NCAA has relied on RPI, a system that's been proven time and again to be improvident, flawed and inferior to other newer, more meticulous rankings and ratings systems. Now the NCAA appears to be ready to introduce other metrics, and in doing so, drastically change its exhaustively referenced team sheets.
But what exactly happened in that meeting in Indianapolis on Friday? Well, I contacted everyone who was invited and got a response from all but one. (ESPN's Ben Alamar has yet to respond to my emails. If he does reply, I will update this post with his thoughts.) I've edited and condensed some responses for clarity and brevity. I wanted to do this in an effort to give basketball fans, coaches and media an enhanced look at why these changes are taking place -- and what disagreements surfaced during the meeting.
For me, the most interesting ongoing dilemma: There's a debate regarding how the NCAA would introduce a composite. It might not be six or seven accepted metrics tossed into a blender and accepted as one universal 1-351. There could be two well-defined matrix composites to draw from. The first would use only predictive metrics (KenPom, Sagarin, BPI), the other only results-oriented (RPI, KPI, ESPN's Strength of Record). I'm also told the door is not closed on still looking at one or two other metrics (LRMC? Massey? Team Rankings?) to include in a composite.
No matter, it seems likely that we will have a new protocol in place for tournament selection and seeding by next season. (Yay!)
And God bless you, Jeff Sagarin. (I saved his response, my favorite, for last.)
Jerry Palm, Bracketology expert for CBS Sports
The NCAA representatives there (Dan Gavitt, Jim Schaus) are committed to replacing the RPI with something else. The suggestion from the coaches committee was a composite of several rating systems (which I would guess includes the RPI). The idea is to use the new composite ranking the same way the RPI is used -- as an aggregator and not a decision tool.
At times, the discussion lost focus on that and talked more about the best way to measure teams. That's not necessarily a problem, though. The idea is to make the new composite a more accurate representation of the groupings of the teams. I think that in the end, it will just be hairsplitting to some degree. I expressed a concern that the more sophisticated the tool that is used, the more people will expect them to use it to make decisions, even though that is not the goal.
There were some pretty smart people in the room who believe strongly in what they do, but the discussion and debate was friendly and for the most part, with very little ego on display. I did a lot more listening than talking. My favorite part was when Jeff Sagarin was giving us algebra lessons.
In the end, there is really no right answer. Whatever they come up with will be imperfect, which is why you need a committee to subjectively evaluate the data. The first thing the committee needs to decide before creating some new metric is what data do they want to measure, specifically, and how. They can't really begin this process until they answer that question for themselves.
Ken Pomeroy, proprietor of KenPom.com
It was encouraging to see the NCAA move forward on improving the selection process. There is a lot of inertia built into the dependence on the RPI and the NCAA is a large bureaucracy, so there are some factors working against change. So credit to Dan Gavitt and David Worlock of the NCAA, and especially the coaches who are pushing to explore better methods. I don't believe there is a similar movement going on in any other NCAA sport.
The main issue at our meeting was trying to figure out whether the committee wanted to choose the best teams or the most accomplished teams for the tournament. I'm still not sure the folks in the room fully understand the difference, but that's probably true for most fans and media members, too.
I think that everyone in attendance left the meeting understanding the importance of the issue of best or most deserving. It sounded like it was something the ad hoc coaches committee grappled with in the offseason so it wasn't the first time it came up. Once there is a consensus on that issue, I think things will progress rather quickly toward a new tool for the committee to use since there seems to be broad support at all levels for a change. There were NCAA representatives from other sports in the room so this could serve as a model for change in other sports that are RPI-dependent as well.
Mike DeCourcy, Sporting News columnist
I felt like I had a particular point of view, but I was there more to observe. I probably talked less in that setting than in any similar gathering of basketball people that I've entered. But I felt like I needed to say less and observe more. The biggest talking point would have been whether performance-based metrics are an essential, or should be an essential, part of the process. I think that's the part of this that they're going to have to come to a decision on, and I think there are certainly people, both on the committee and in the game, that would like to see that. And I would imagine there are probably a lot on the opposite side as well.
They've been doing it for a very long time without worrying about how many points you win by, and if they're going to go in that direction, they have to cover a few hurdles in the sense of, does it create issues with sportsmanship? That may have been the one single point I raised is that, if you make this a factor for coaches, they're not going to ignore it. It's going to be part of their thinking. There were some differing opinions on that. Some thought if you're trying to lay it on a team, you get into the risk/reward of keeping your starters in.
There's no question in my mind that the [NCAA's primary] metric will be different. But they're not changing the process. So anyone who thinks that's what this is about is misguided. It's about changing the foundation of what the metric is that you're playing toward. As much as I've defended the RPI as not being as bad as people think, there's no reason why "not as bad as you think" should be passable. It's not that hard to do better. To do best I think is a challenge, and that involves a lot of questions.
What I did learn on Friday, the most intriguing thing was Kevin Pauga's idea to redefine the team sheets in what true accomplishment is. I'm a believer in a accomplishment more than performance. Wins and losses mean more than margin of victory. But Kevin's position was that performance isn't being accurately reflected on the team sheets.
Kevin Pauga, proprietor of the KPI
Friday's meeting was a step forward. We did not leave the meeting with a final solution, but discussed different viewpoints at the table, focusing on the objective of the NABC ad hoc committee to create a composite metric to de-emphasize the RPI. A tiered system accounting for both game location and opponent quality can enhance Top 50/100/200 wins (i.e. Tier I could be Top 25 home record, Top 50 neutral record, and Top 75 road record). We evaluated margin of victory and how models are more accurate when it is factored. All data points are "baked" into each metric, meaning if you focus on a strong or weak point of a resume (i.e. non-conference strength of schedule, a single bad loss, etc.), one risks doubling down on that information. Results-based metrics are critical to both selection and seeding, as teams earn geographic and competitive advantages by ultimately winning games against strong competition.
While a composite metric has value, including minimizing outliers and providing context, how that metric is computed needs refinement. Results-based and predictive metrics are both powerful tools. The idea of creating separate results-based and predictive aggregates was introduced. Averaging them is not necessarily mathematically sound, but both art and science may be needed here. Any composite needs to be vetted out over several years of past data and scrutinized on a team-by- team basis to realize outcomes and avoid unintended consequences.
Nobody is advocating that a composite metric be used to replace the selection process or the human element. This is a refinement, not an overhaul. Any solution needs to be easy for everyone to understand. The right people are involved in the conversation, including the NCAA staff, representatives from the NABC and its coaches, the men's basketball committee and those of us involved in developing analytics.
Andy Glockner, author and analytics expert
I think it's a good -- and very long overdue -- idea for the NCAA basketball officials/tournament committee to explore a better way to measure team quality. The RPI formula is decades old, mathematically flawed and we have so many better ways now to determine how good you are. I don't think the NCAA's task is a simple one. They are getting feedback from the Division I coaches, a majority of whom (per disclosure in the meeting) would prefer one new metric, rather than two or more different formulas that measure different aspects of a team's accomplishments (e.g. how good are your wins and losses, which doesn't necessarily require a margin-of-victory component versus how good is your team, which absolutely does require MOV).
They need to get buy-in, ostensibly, from a variety of media partners who will use the new rating in the context of discussing NCAA basketball, and the tournament, etc. Generally speaking, the NCAA annually does a reasonably good job with team selection, and maybe less so with seeding/bracketing, and those are two very different discussions. Do you want the teams with the best resume or that you think are the best? Does that impact selection more than seeding or vice versa? And, more crucially, how are those resumes built?
Will the new process more accurately weight home versus road wins/losses, or the actual difficulty in winning games versus teams ranked outside the top 100, rather than assuming they should be wins and don't mean very much in terms of tournament consideration? Can we forecast more accurately how other bubble teams would have done against mid-major league schedules? So the first question the NCAA folks need to answer after yesterday's meeting is, "What do we want this metric to do/tell us?"
I left Friday feeling encouraged by the willingness of Dan Gavitt, David Worlock and the rest of the folks there to listen openly to a range of new solutions to an age-old problem. I also think it's going to be a lot harder than many think to get all of the constituents on the same page.
David Worlock, NCAA director of media and statistics
While there's still plenty to consider and ultimately do, we finished Friday's meeting in a better place than we started the day. We anticipated this would be a great use of time going into the meeting, and sitting down with this group certainly proved to be invaluable. The committee has used each of these metrics for several years, but we realize there are ways to improve this part of the selection and seeding process. And it's important to emphasize just that-- the use of metrics is just one component of the committee's season-long evaluation of teams.
The committee and staff will spend the next several weeks communicating with the NABC's ad hoc committee to figure out next steps, with the idea of coming to a reasonable consensus on the makeup of the composite metric. Then we need to figure out details such as determining the quality of home, away and neutral-court wins, and how we present the data so that it's easily consumed by everyone. Working with the group we hosted Friday and with the NABC, the committee hopes to have something in place by the 2017-18 season.
Jeff Sagarin, proprietor of the Sagarin rankings
Some possibilities tossed around were making up two different composites:
Then using the results to strongly influence who gets in the tournament and then allowing predictive to have a small influence on the seeding itself.
The problems are several fold: most systems, including mine, are proprietary and somewhat complicated. But the NCAA needs to have a system that is mathematically reproduce-able by anyone who has the data and is a decent mathematics/computer programmer. That way every university would have many people on their campus capable of reproducing the results so no one would be in the dark.
So, I therefore initially recommend a very simple but mathematically sound system that isn't even mine. I first read of it back in 1976 ... it's a simple offshoot of the very simple pure-least squares method. It's only flaw in how it was written, was that it ignores the game locations. But that's easily incorporated which of course anyone doing it now would do. Pure least-squares simply takes the scores and locations of the games and solves for the ratings of all the teams and the overall home edge if the latter was put into the program. The simple offshoot adds a bonus of say 50 or 100 points to the margin in favor of the winning team (unless of course the game was a tie, in which case you just leave the margin as it is -- zero).
Thus a one-point win gets turned into a 51 point win (or 101 points) while a 40-point win gets turned into a 90-point win. Note that the ratio has been changed from 40 / 1 to the much smaller 90 / 51 = 1.7647 to 1. If the bonus had been 100, the numbers would be 101 and 140 with an even smaller resulting ratio of 140/101 = 1.3861.
The bonus numbers of 50 and 100 are completely arbitrary. The NCAA could experiment with all sorts of "bonus" numbers and see which results seemed the "best."
The one part of this that I could take credit for is that the program would solve for the overall home advantage as part and parcel of how it runs. The original article I read in 1976 ignored the game location, which is very wrong to do that. Anyone who follows the sports world knows either from a mathematical or simple fan viewpoint that playing at home is a very important advantage. Thus if two teams had precisely the same results against the exact same schedule (other than game locations) the team that played more games on the road would have a better rating.
But the key idea is adding the bonus to the margin (unless of course the game was a tie).
Here's a fun "gedanken" (thought experiment). Imagine a team that played a 28-game schedule, all on the road against the top 28 teams in a rating system (leaving the new team out). And imagine they lost each game by exactly one point, on the road. Given a rough home edge of, say, four points, that would mean on a point basis, this hypothetical team would be three points better than the average rating of the top 28 teams! And yet they'd be 0-28 and yet their point rating would have them probably be about somewhere between sixth and 10th in the country! It's a tough philosophical and practical question: Everyone would realize that inherently, this team was truly very very good, and yet how could they be given a tournament bid at 0-28? I'd give them one, since now they'd get to play games on a neutral court!