Professional Documents
Culture Documents
Nolan - Dominance Matrices Project
Nolan - Dominance Matrices Project
Nolan - Dominance Matrices Project
Professor Hummel
Math 210
Statistical rating systems are useful in any competition. These systems often use complex
and intricate mathematical calculations. The use of rating mathematical rating systems to rank
competitors against each other became widely popular with the use of the Harkness system in
professional chess tournaments back in the 1950’s. In 1960 Arpad Elo [1], Professor of Physics
at Marquette University and master level chess player, introduced his own system which was
soon adopted by the United States Chess Federation and the FIDE, which governs chess
worldwide [6]. The rating system is still used in chess tournament play today and has more
recently been popularized further in competitive online multiplayer games such as DOTA 2,
League of Legends, and World of Warcraft [6]. The ELO system ranks each player based on
their performance against other players by a number on a normal distribution from 0-3000 [6].
On a basic level, wins against an opponent with a higher ELO rating have a greater impact on
overall player ranking than wins against a player with an ELO rating of equal or lesser value.
The ELO and Harkness rating systems are useful in chess and video games because they are
considered “zero-sum games.” The win condition of these games is based on an objective that
has no score directly tied to it, such as destroying the enemy base in popular online games or
putting the opponent in checkmate in the case of chess. There are no scores tied directly to
1
The rating systems needed for sporting events are much more complex, as most all sports
are not considered “zero-sum games.” Very simply, the games are ultimately decided by one
team scoring more points than another. A basic understanding of statistics will tell you that a
competition that presents a higher total score between two teams could, in theory, yield a more
stable statistical rating of a team. That is, in soccer a bad team could score a “fluke” goal in the
final minute to win a game that they statistically shouldn’t have,while in basketball a bad team
must score consistently to keep up with the opposition in a game where one team alone can score
upwards of 100 points. Even so, statisticians, fans, sports betters, and even oblivious spectators
have been continuously stumped by the statistical powerhouse of the NCAA March Madness
Tournament. In the tournament 68 teams from across the country are ranked and pitted against
each other in a single elimination tournament. And each year people across the world attempt to
correctly predict the outcome of all 67 games by filling out their own bracket before the
tournament begins. With a total of 9.2 quintillion (9.2 * 10^18) bracket combinations, it’s
obvious why no one has yet to perfectly predict the outcome of every game (a feat that has a
While a perfect bracket may never be found, it could be useful to employ a rating system
to try to predict as many games as possible. In 2003 Charles Redmond of Mercyhurst University
published a paper in Mathematics Magazine that presented a system for rating teams in a round
robin tournament by way of dominance matrices and eigenvalues [2]. This system is as relatively
simple as the ELO rating system while also being more applicable to sporting events as it also
takes into consideration margin of victory for each game. Using Redmond’s system we will
attempt to rank each team and fill out a bracket for the 2019 NCAA Men’s March Madness
Bracket.
2
Mathematical Background: In his paper Redmond uses a simple example tournament between
four teams to explain his rating system. I will give an overview of that example here [2]. He
denotes four games between teams A, B, C, and D with each team playing two games.
Opponents Score
A vs. B 5-10
A vs. D 57-45
B vs. C 10-7
C vs. D 3-10
We see that team B has a record of 2-0, teams A and D at 1-1, and C 0-2. Thus, team B
has a winning percentage of 1.00, A and D both have 0.50, and C with 0.00. Redmond revises
this rating for mathematical simplicity by assigning a score of 1 for each win, 0 for a tie, and -1
for a loss, and then dividing a team’s total score by the number of games played to find their
rating. The new ratings are B - 1, A - 0, D - 0, and C - (-1). The rankings of each team have not
changed. We still see B in first, A and D tied for second, and C at last in the rankings. Using this
same scoring system we can define a team’s “dominance” over another individual team. B has a
dominance of 1 over A and conversely A has a dominance over B of -1. Redmond then makes a
point to note the major flaw in taking this zero-sum approach to games with a score tied to them.
In this example we see team B beating both teams A and C. Team A loses by 5 points and C by 3
points. Even so, A is rated above C even though team C looks to have fared better when
compared though team B. Conversely, B has a dominance of 1 over both teams even though the
scores would tell us that B is more dominant over A than C. Redmond states that this approach
3
“reflects imperfectly what has really happened.” He makes another change to the system by
redefining a team’s dominance over another by their net score against that team. Therefore B has
a dominance of 5 over A and A a dominance of -5 over B. We can then define a team’s average
dominance as their net score in all games divided by the number of games played. The average
A 3.5
B 4
C -5
D -2.5
Redmond then brings up an important factor of any rating system: strength of schedule. A
bad team could win every game they play because they are consistently playing against even
worse teams while a better team could lose constantly due to their schedule of games
consistently being against better teams. This factor is addressed in the ELO system by the
intricacy described above where a player is given more rating points for beating a stronger
opponent. To accomplish this in Redmond’s system he makes the revision to say each team plays
themself in a hypothetical game in which they tie. This seems inconsequential at the moment but
will be important to note later. We then see the ratings revised by dividing the total dominance
by an additional game.
Team Rating
A 7/3 = 2.33
B 8/3 = 2.67
4
C -10/3 = -3.33
D -5/3 = -1.67
We are now able to get to the crux of this system. We are able to two teams that have not
played each other directly. This is easy to do at a small scale. Redmond asks us to consider A
and C once again who have never played each other, but note that both teams have played D. We
can then create a path from A to C through team D. A beats D by 12 points and D beats C by 7
points. We can then imagine a game between A and C in which A beats C by 12 + 7 = 19 points.
Redmond notes team C as a “second-generation opponent” of team A. Using this system we can
imagine each team playing 9 games instead of the previous 3. Redmond gives a diagram to show
the pathing of each of A’s 9 games. Keep in mind that we are also considering team A as playing
We retain the first-generation scoring aspect of a A’s game with B through the pathing of
A→B→B and also by A→A→B where the score is calculated by -5 + 0 = -5 and 0 + -5 = -5.
This is the importance of having a team play game against itself, so that these scores are
5
retained. It is also important to realize the first-generation game is considered twice in the
second-generation rating. This makes sense as an actual game between two teams should
have more weight into their rating than an imaginary second-generation game. Also note
that a team plays itself three times in the second-generation.
provides more data points for us to consider. Additionally, from a sports standpoint, this creates a
more accurate rating because it accounts for strength of schedule. Team A is rewarded 31 points
for beating team D; 12 from beating them directly and 19 additional points due to D’s strong
performance against C.
A 3.44
B 3.22
C -4.11
D -2.56
It is interesting to see that A has now moved into first place by considering strength of schedule
6
Redmond goes on to explain that it would make sense that making additional subsequent
generations would yield increasingly accurate results. This would suggest that a limit exists to
how accurate the results can become. He states that it is important to ask at this point if a limit
exists. If it does, the ratings at that limit would provide the most accurate description of each
team. To find this limit and to expand this system to a larger tournament, like the NCAA
tournament, we must apply principles of linear algebra. Redmond continues his explanation with
the same tournament of the four teams. He illustrates the structure of the tournament as follows.
We can see each team is connected to each team they played (including themselves). He then
creates a 4x4 matrix in which a row and column is assigned to each team. A 1 or 0 is entered
depending on whether or not a team has played another. The first-generation matrix (M) is
M2 is the number of unique paths between any two teams. He asks the reader to think this
through by looking at the illustration above and the diagram of A’s second generation games. We
7
notice that A is a second-generation opponent of itself three times, which is reflected in M,2 just
as the other teams are second-generation opponents twice. M2 can be understood as the
second-generation matrix of the tournament structure. It is interesting to see that simply squaring
the matrix results in the second-generation. This makes sense looking back however, as the
teams all played 3 games in their first-generation and 32 = 9 games in the second-generation.
where the coordinates are the net points scored by each team. He explains how to find the
coordinates of the vector for the first generation ratings of our teams.
From here we can move on to see how to go about computing second-generation ratings.
To do this we must add first-generation dominances in three times. This will be given by 3M0 *
S. We also consider the second-generation dominances, but only adding them in once. This will
8
be given by M1 * S. Redmond then reveals the coordinates of the second-generation ratings as
being calculated by
We now come to our final equation for finding the ratings for the n-th generation.
Redmond continues on to explain how to apply a limit to infinity for this summation
through eigenvalue decomposition. This is a useful tool when trying to obtain the most precise
rating for a team, but it is unnecessary for our application. Additionally, taking a limit of this
particular summation to infinity for a 70x70 data set will require a large amount of
computational power. For the sake of my computer we will simply take the summation to 100.
While this may not give the most accurate rating, the ranking of each team relative to each other
The Application:With the necessary equation at our disposal, rating the teams in the
2019 March Madness tournament should be straightforward (but nevertheless tedious). So long
as every team in the bracket can be tied to at least one other team by a regular season matchup
the calculation will scale from a 4 team tournament to any size. A compilation of the scores of
There are a few things to note in this spreadsheet. Firstly, it should be explained that only
68 teams are entered into the tournament each year. That is the teams who won their conference
9
tournament plus the teams that are selected by the NCAA because of their overall performance
during the regular season [4]. There are 32 conferences in the NCAA so that still leaves 36 teams
who are granted an “at large” bid. These are the teams hand-picked by a selection committee. In
the scores matrix there are 70 teams. This is because two teams from smaller conferences,
Bradley and Farleigh Dickinson University, had not played any else in the tournament. By
adding UIndy’s neighbors IUPUI we can link Bradley to the other teams in the bracket as IUPUI
played both Bradley and Northern Kentucky. Similarly, adding Princeton ties FDU to Duke,
Iona, Yale, Arizona State, and St. Johns. The addition of these teams will have a negligible
impact on our results and their results will be meaningless in the application of filling out our
bracket.
From this scores spreadsheet we can make matrix (M) from our equation by replacing the
scores in each box with the number of games played between any two teams. We must also
include at least imaginary games that each team plays against themselves. One complication that
arises in this application is that each team has not played the same number of games as in the
example Redmond provides. Redmond explains that to combat this, we can simply input as many
imaginary games we need to equalize our total number of games [2]. Kansas played more games
than any other team with 21 total regular season games against other tournament teams. Once we
add the one imaginary game they play against themself we have them down as playing 22 total
games. For each team we will add as many imaginary games as is necessary to get their total to
22 as well. We notice that this matrix is symmetric across the diagonal. This makes sense as, for
example, if Duke played North Carolina 3 times then that should be reflected in the (Duke, North
Carolina) cell as well as the (North Carolina, Duke) cell. Matrix M can be found here.
10
The only remaining variable in our equation is (S). We can find this by taking the net
score (no pun intended) of each team across their row and compile those scores into a 70x1
vector. We notice that the sum of all elements in this vector adds up to zero. This makes sense
given how the vector was equated. Vector S can be found here.
The last edit we must make before making our calculations is a slight alteration to our
equation. Since teams play 22 games in our tournament we will replace the 3 in Redmond’s
n
1 M j−1
Results = lim ( ∑ ( ) * S)
n→∞ j=1 22 22
Using this altered equation, M, S, and some help from MatLab we can calculate the final
rankings of each team (found here). We can then use these rankings to fill out a bracket. The
The Results: So how well did Redmon’s ranking system work? By comparing the calculated
bracket to the results of the 2019 tournament (found here) we see a 60% success rate at correctly
picking games [4]. But usually brackets are scored by rewarding more points for correctly
First 1
Second 2
Third 4
Fourth 8
11
Fifth 16
Sixth 32
We can score the calculated bracket in this way as having 89/192 possible points. We can also
compare our calculated bracket to the average bracket entered into ESPN’s Bracket Challenge by
the ESPN People’s Bracket (found here): a representation of the most common picks made by
the millions of brackets submitted to ESPN. The People’s Bracket only scored 75 points.
Therefore, by this standard we can say that applying Redmond’s method to selecting teams in the
2019 March Madness tournament has provided a more accurate prediction than the average
Discussion: Our results still leave room for improvement. There are some inherent flaws in
applying this system to the NCAA tournament. Firstly, it should be noted that there is so much
more that goes into evaluating an individual team than their net score. Teams matchup
differently against other teams based on a variety of factors. For example, UCF’s team in 2019
featured 7’5” center Tacko Fall. As you can imagine any team that is unable to match Tacko’s
size will struggle to score against him or stop him from scoring himself. This means that UCF is
a great matchup against smaller teams that aren’t great at shooting because Tacko can shut down
their inside scoring. But as soon as UCF ran into Duke in the tournament Tacko was matched up
against 6’6”, 284lbs Zion Williamson. While Zion is almost a foot shorter than Tacko, he is
incredibly strong and was known as one of the best players in the tournament in 2019. Zion’s
strength and sheer skill was able to shut down Tacko in UCF and Duke's second round matchup.
12
Another flaw in this application is the opportunity for outliers. For some teams we have a
very small data set to go by. Old Dominion only played two games against tournament teams in
their 2019 regular season and racked up a net score of 16 points. This is why we see them ranked
9th after our final calculation, only 1 spot over their first round opponent Purdue. This is just like
the soccer team making a “fluke goal” as discussed earlier. In our calculated bracket we see Old
Dominion make what would be a historic run to the fourth round. In actuality Purdue beat them
by 13 points in the first round and made a run of their own run into the fourth round. Upsets are a
major part of March Madness, but this application allows for some lapses in judgement when it
comes to picking teams. While any one team theoretically has a chance to beat any other team on
any given gameday, I would imagine no professional sports analyst is projecting Old Dominion
to go to the fourth round with any level of confidence. It should be noted that our results could be
made more accurate by taking into account all games played in the 2018-2019 regular season.
Even games between non-tournament teams. We could then use the calculated rankings of only
the teams chosen to participate in the tournament to make our bracket. This would require a
supreme amount of data entry and processing power that I did not have at my disposal.
The application is not all bad though. It is excellent at sifting through teams with good
records against a weak schedule. Wofford came into the tournament with an impressive 29-4
record. But all four losses were against tournament teams. Wofford’s calculated ranking was
57th, a much better assigned rank than if we were simply basing rankings on win percentage.
So while I doubt this system is winning anyone any prize money, it is without a doubt
interesting in the way it simplifies ranking. It has already proven to predict games at a better rate
than the average human. The application of this system could be used in some right to aid in an
individual’s bracket selection or even revised to calculate based on other important statistics like
13
rebounds or assists. And as previously stated, it could surely be made more accurate by taking
into account the statistics of each team in the NCAA and by taking the limit to infinity. Whatever
the case, we should see the equation’s successes as an example of mathematics and statistics’
ability to predict in the real world, and the equations failures as a celebration of the randomness
that comes with sports and human competition. Everyone loves an underdog and so long as
underdogs are defying the odds in the March Madness Tournament, no one equation will ever be
14
References
http://glicko.net/research/chance.pdf
[2] Redmond, C. (2003, April). A Natural Generalization of the Win-Loss Rating System.
[3] Sports Reference CBB (2020). College Basketball Stats and History.
https://www.sports-reference.com/cbb/
[4] Staats, W. NCAA (March 21st, 2019). NCAA bids 2019: Bracket for March Madness.
https://www.ncaa.com/news/basketball-men/2019-03-21/ncaa-bids-2019-bracket-march-
madness
[5] Wile, R. Business Insider. (Jan 21st, 2014). Warren Buffett will Give you $1 Billion if you
https://www.businessinsider.com/warren-buffett-billion-dollar-bracket-2014-1
https://worldchesshof.org/hof-inductee/arpad-emrick-elo
15
Document Links
Scores Matrix
Results
Calculated Bracket
Actual Bracket
16