Tuesday, November 27, 2012

College Football Conundrums

A continual problem in ranking college football teams is dealing with cyclical win relationships.  No matter the ranking, there are always a number of ranking violations where Team A is ranked ahead of Team B despite the fact Team A also lost to Team B.  For example, we see Alabama ranked ahead of Texas A&M in almost every ranking.  The BCS, AP Poll, and all computer rankings are filled with these ranking violations, conceivably because the loss was viewed as an anomaly or there are circular relationships (X beats Y beats Z beats X) making a violation unavoidable.  While rankings do exist that seek to minimize the number of these violations (see Coleman's MinV), this is not viewed as a serious problem for most computer rankings on the grounds that each match up is probabilistic.  For example, the team panel for Alabama on cfbtn shows Alabama is likely to win the SEC Championship, but not certain, and a Georgia win would not disprove the validity of the approach.  Instead it merely suggests that if the two teams were to play one another over and over, Alabama would most often come out on top.  Last year's pair of games between LSU and Alabama illustrate that we cannot make deterministic predictions about game outcomes given the amount of noise (weather, injuries, emotions, referees, etc.) in college football.  

The ranking violations, however, are born of an interesting potential problem of cyclical interrelationships between teams.  In some situations, there is literally no way to avoid a ranking violation because X defeated Y who defeated Z who defeated X, creating a loop of wins.  The network of inter-relationships within the SEC demonstrates the complexity of this problem to our ranking schemes.  We see LSU defeated Texas A&M who defeated Alabama who defeated LSU.  To add to the problem, Florida defeated both LSU and Texas A&M, but lost to Georgia, who did not play any of the three teams in the cyclical triangle.

Unraveling this web of wins is difficult.  Pollsters use their impressions of performance.  Most computer rankings would use statistics from the individual games (i.e. total yards) to sort it all out.  The Network Rankings I discussed on cfbtn last week would more or less be stuck.  By only looking at wins and losses, we essentially have a 3 way tie between LSU, Texas A&M, and Alabama.

We can solve this problem, regardless of our approach, by incorporating more information, and this is one reason behind the superiority of rigorous methodological approaches to polls that merely gauge impressions.  Rigorous computational approaches allow the incorporation of more detailed and extensive information, providing us with a tool to unravel these ranking conundrums in a more effective manner than Kirk Herbstreit's "look test."  To illustrate, I'll use my own Network Ranking as an example.  To unravel the conundrum I reproduce the above illustration, but this time include all first order relationships teams, or teams that each of the five SEC teams have played.


Quickly, we begin to see why Florida is ranked highest in the Network Ranking and a number of other rigorous approaches.  They seem to have the strongest set of inter-relationships - meaning more wins and more proof of their goodness.  We also notice a few unique ties relevant to individual SEC teams from their non-conference (and non-Sun Belt) opponents.  Florida has Florida State, Georgia has Georgia Tech, LSU has Washington, and Alabama has Michigan.  These important ties might provide us with further information to unravel the conundrum, so by taking another step and including second-order relationships, or teams that opponents of the five SEC teams have also defeated, we may be able to unravel the ranking problem further.



This second order diagram gets very messy, very quickly, but it illustrates the extra-conference linkages of the SEC, with Alabama connected to the Big 10 through Michigan, Florida and Georgia connected to the ACC (Florida more strongly due to Florida States superiority over Georgia Tech in the conference), and LSU's to the PAC12.  In the Network Rankings, calculating these linkages through wins and losses provides their relative rank.  In most other computer rankings, the underlying concept is the same - comparing relative schedules to weight statistical observations of total yards, defense, etc.  Regardless of the approach, we require some form of mathematical rigor to at minimum organize the information and compare evidence in favor of each team, as no matter how sophisticated the expert doing the looking, the "look test" is limited in its ability to compare relative performance by the limited amounts of information the expert polled is capable of considering.

Ranking violations are not an inherent problem for any ranking because we have more than just that one game where Texas A&M defeated Alabama to generate a rank.  We have observations from other games that allow us to determine whether or not Alabama has demonstrated that it should probably be lower ranked than A&M or if that one loss is somewhat anomalous.  Ranking violations are unavoidable, particularly within conferences where teams have similar schedules, but careful use of evidence allows for sorting out the college football conundrums.




1 comment: