25
$\begingroup$

The Elo rating system is used to rank players in games such as chess. I can find plenty of explanations online of how to compute someone's Elo rating, how to actually crunch the numbers in practice, but I can't find a single clear conceptual explanation of what the rating is supposed to mean and why.

The only information I can find is that apparently the Elo rating of two players allows you to calculate the odds that one player will win against the other. But every page I've been able to find that talks about this just drops the formula for how to calculate these odds on you and says "there you go, that gives the probability of winning", without explaining why. Wikipedia mentions something about the assumption that "chess performance is normally distributed", but doesn't go any further.

What is the underlying probabilistic model for two-player games that the Elo system is based on? What are its basic assumptions, and what is the proof, from those assumptions, that the Elo system does indeed allow you to calculate win probabilities?

$\endgroup$
3
  • $\begingroup$ I don't know much, but I do know that the calculation of the expected number of average points over the course of many matches between the same two opponents (according to ELO) is only dependent on the difference between their ratings, and not the concrete rating values. Also, I've heard that the estimate favours the higher-ranked player more than reality does, especially for large differences, which is why high-ranked players usually don't participate in turnaments with lower-ranking players. $\endgroup$
    – Arthur
    Commented Apr 7, 2016 at 14:10
  • $\begingroup$ I find the best way to think about it is that the Elo updates move the parameters (player ratings) in the direction that maximises the log likelihood of the true event occurring. This helps to shed light on why it intuitively works and probably allows us to prove the convergence that for many players who have a true Elo rating that does not change, if they are to play each other many times then some average of the estimated Elo ratings converges to the true Elo ratings. $\endgroup$
    – rwolst
    Commented Jul 3, 2017 at 15:14
  • $\begingroup$ The Elo system is well explained by Elo in Chess Life, August, 1967, p. 242-244, The Proposed USCF Rating System, Its Development, Theory, and Applications, (uscf1-nyc1.aodhosting.com/CL-AND-CR-ALL/CL-ALL/1967/1967_08.pdf). $\endgroup$
    – clp
    Commented Mar 21, 2023 at 11:50

5 Answers 5

17
$\begingroup$

The key point about the Elo rating is that it is related to the log-odds of players winning games.

It assumes that there is a relationship across players, so that (ignoring the possibility of draws) if Player B is $10$ times as likely to beat Player A as Player A is to be beat Player $B$, and Player C is $10$ times as likely to beat Player B as Player B is to beat Player C, then Player C is $100$ times as likely to beat Player A as Player A is to beat Player C.

The Elo rating is scaled so that (ignoring the possibility of draws) if Player B is $10$ times as likely to beat Player A as Player A is to beat Player B then the Elo rating of Player B should be $400$ higher than the Elo rating of Player A. Combining this with the earlier assumption has the result that, if Player C is $100$ times as likely to beat Player A as Player A is to beat Player C, then the Elo rating of Player C should be $800$ higher than the Elo rating of Player A: each linear increase in the difference of Elo ratings of $400$ multiplies the odds of the better player winning by a factor of $10$, so this is a logarithmic relationship.

Putting these together means that the prediction based on Elo ratings $R_A$ and $R_B$ gives $$400 \log_{10}(\text{Odds}(\text{B beats A})) = {R_B-R_A} $$ and that implies $$\text{Odds}(\text{B beats A}) = \dfrac{\Pr(\text{B beats A})}{\Pr(\text{A beats B})} = 10^{(R_B-R_A)/400} $$ and combining these with ${\Pr(\text{B beats A})}+{\Pr(\text{A beats B})}=1$ would give a probability prediction $$\Pr(\text{B beats A}) = \dfrac{10^{(R_B-R_A)/400}}{10^{(R_B-R_A)/400}+1} =\dfrac{1}{1+10^{(R_A-R_B)/400}}$$ and a predicted expected net result for Player B of $$\Pr(\text{B beats A}) - \Pr(\text{A beats B}) = \dfrac{10^{(R_B-R_A)/400}-1}{10^{(R_B-R_A)/400}+1} =\dfrac{1-10^{(R_A-R_B)/400}}{1+10^{(R_A-R_B)/400}}$$

The Elo score then has two further useful features: first a mechanism for adjusting scores when results are not as expected (and a $K$ factor which attempts to balance the desire that incorrect scores should adjust as quickly as possible against a desire not to have too much volatility in scores); and second a method to address competitions which are not just win-lose, by focussing on expected net results from a contest rather than just the odds and probabilities of wins and losses.

$\endgroup$
7
  • 3
    $\begingroup$ Great, thanks. So we basically model chess players with a weighted directed graph (the weight on the edge A -> B being the odds that A beats B), with the transitivity assumption you described. That seems like quite a non trivial assumption. Do you know a reference with more discussion on the details and the validity of the model? $\endgroup$
    – Jack M
    Commented Apr 8, 2016 at 9:57
  • 4
    $\begingroup$ The model is engineering not mathematics: in the past it seems to give satisfactory results except when it does not (examples include rating inflation and deflation, or excess volatility associated with newer players, or ratings in one high-school bearing little relationship to ratings in another high-school until after players from different schools meet), so many minor fixes have been introduced. Tournaments often have ranking requirements, which prevents the extremes of the model being tested coherently $\endgroup$
    – Henry
    Commented Apr 8, 2016 at 10:03
  • 1
    $\begingroup$ @Henry Great answer! Can we call 400 a scale factor? Is this number used by Arpad Elo himself? $\endgroup$
    – Kortchnoi
    Commented Apr 5, 2020 at 14:17
  • 2
    $\begingroup$ @Kortchnoi - it is some sort of measure of scale in the system (for example you could multiply everybody's rating by $5$ and then use $2000$ where $400$ is currently used), though I think not a scale factor in same sense as that used in probability distributions. I have not see Elo's original formulation, but it seems likely that he originated the $400$ as representing an odds ratio of $10$ $\endgroup$
    – Henry
    Commented Apr 5, 2020 at 15:39
  • $\begingroup$ if Player B is 10 times as likely to beat Player A as Player A is to be beat Player B, and Player C is 10 times as likely to beat Player B as Player B is to beat Player C, then Player C is 100 times as likely to beat Player A as Player A is to beat Player C. Why is that assumption justified? $\endgroup$ Commented Oct 4, 2023 at 17:30
5
$\begingroup$

Here are two very interesting articles from Mr. Mark Glickman, who is a statistic professor at Harvard University. I think it answers to your questions:

http://glicko.net/research/chance.pdf

http://www.glicko.net/research/acjpaper.pdf

$\endgroup$
2
  • $\begingroup$ I haven't read the papers in full yet, but the one which gets closest to answering the question still seems to miss the mark. It helpfully describes a model in which we think of chess players as generating normally distributed random numbers, and the winner of a game as being the one that generates the larger number, but then when it gets to the formula for the probability of one player winning the game (page 10) it just calls it an assumption of the model. I find it hard to swallow that such an arbitrary formula is literally just assumed with no justification. $\endgroup$
    – Jack M
    Commented Apr 7, 2016 at 19:13
  • 3
    $\begingroup$ This is a link only answer. Can please quote the important points? $\endgroup$
    – Calmarius
    Commented Feb 19, 2021 at 9:10
2
$\begingroup$

Not enough reputation to comment, and 3 years late to the discussion, but ... I give an answer to this question in this post.

Yes, the win probabilities come from a strange $400\log_{10}$ log-odds, and you could further argue that the ratings themselves are the weights of this logistic regression, which we are doing streaming/online updates on using game outcomes (like a stochastic gradient descent step).

You could instead interpret Elo as an AR(1) autoregressive model (like this talk (opens PDF)), which would probably help explain 538's "autocorrelation" term, really to maintain stationarity.

$\endgroup$
1
$\begingroup$

A couple of minor points augmenting some of the above answers.

... and what is the proof, from those assumptions, that the Elo system does indeed allow you to calculate win probabilities?

what is the proof that a $\chi$ or Lorenzian or Gaussian distribution allow you to calculate favorable probabilities?

There is no proof. You assume that your probabilities are given by the given distribution (in Henry's answer) and then (and after your distribution model is "sufficiently validated" by actual results), you just make the same simple test that you do in Statistics 101.

What "sufficiently validated" means: That you first choose your model to be some common distribution like the Normal one and then check this by recording the distribution of the play results, using the actual ELO points of the players.

If a "skewness" (or bias) results between the distribution of the actual results and the assumed Normal, this very bias will show you how to adjust your old distribution to get one that will be closer to experimental results (Your comment).

So you iterate and refine. Each time you iterate, the new bias will point closer to a better distribution you should use for your next iteration.

The current ELO distribution then is nothing more than a finitely-timed approximant of the distribution results you get from actual trial runs with players with actual ELO ratings, for some small bias $\epsilon>0$.

Now assuming the whatever ELO approximant, your test is as simple as putting a vertical bar on the graph of the distribution. I.e., with:

restart;
with(plots):
AR:=2500;
AEF := proc (x) options operator, arrow;
1/(1+10^((1/400)*x-(1/400)*AR)) end proc;
plot(AEF(x), x = 1000 .. 3500, color = red);

The curve below is the prediction for a x ELO player losing to a 2500 ELO player. The distribution shows you your chances of losing.

Some of the advantages of the ELO model with chess

The good thing with this distribution (as with any distribution that models sufficiently accurately some phenomenon) is that it tells you many interesting things about the game it models just by looking at it:

First of all, you notice that it is a very "dense" distribution. Like the Fermi distribution of degenerate electrons in the core of a Neutron star. It has an abrupt fallout and its appeareance indicates a fairly large subset base (large compact support), where the distribution remains almost constant.

If you interpret some of the above features correctly (looking at the corresponding items like expectation, momentum, etc.), you can the make some fairly solid probabilistic statements, some of which may prove to be true.

For example, the particular abrupt fallout, tells you that much more effort is required to cover the distance between 2000 and 2700, than to cover the distance between 1000 and 2000 (with integrals from a to b as work, etc)

The fairly large compact support on the other hand (1000-2000), reveals something which is, surprisingly true. That the really good players are relatively rare :*)

enter image description here

$\endgroup$
0
$\begingroup$

I didn't see anyone mention--though it is possible Glickman mentions it in the linked articles--the Bradley-Terry model from which the formula for expected score based on ratings is derived: https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model . The thought is that, if two players have ratings(strengths relative to one another) A and B, the expectation of player A would be A/(A + B) and the expectation of player B would be B/(A + B). This is just using ratio scale versus the logarithmic scale adopted by Elo. If you set EA equal to A/(A + B), you come up with a formula using the ratio scale. Using the logarithmic scale base 10 with the ratings such that a difference of 400 is correlated with a 1/11 chance of winning(one player being 10 times stronger than the other), you end up with the familiar formula.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .