Who is the master?

photo de garde

Une version francaise de cet article est disponible ici .

The ranking of players in general, and especially of chess players, has been studied for almost 80 years. There were many different systems until 1970 such as the Ingo system (1948) designed by Anton Hoesslinger and used by the German federation, the Harkness system (1956) designed by Kenneth Harkness and used by the USCF federation, and the English system designed by Richard Clarke. All these systems, which were mostly ``rule of thumb'' systems, were replaced in almost every chess federation by the ELO system around 1970. The ELO system, the first to have a sound statistical basis, was designed by Arpad Elo from the assumption that the performance of a player in a game is a normally distributed random variable. Later on, different systems trying to refine the ELO system were proposed, such as the chessmetrics system designed by Jeff Sonas, or the Glicko system designed by Mark Glickman, which is used on many online playing sites. All these systems share, however, a similar goal: to infer a ranking from the results of the games played and not from the moves played.
Thus it is possible to win points and enhance a ranking in one game even if you don't play well, as long as your opponent is making more mistakes. Statistically, this bias should disappear with a large number of games. However, the ELO system (and all related systems) are only efficient to compare players playing at the same period, because points are won or lost in head to head matches. But it is extremely difficult to compare the ELO rankings of 2017 to the ELO ranking of 1970, a problem known as "drifting through years". Thus there is a large literature (such as Raymond Keene and Nathan Divinsky: Warriors of the Mind, A Quest for the Supreme Genius of the Chess Board), which discusses which player was the best ever, but all their conclusions remain highly debatable.

In 2006, Guid and Bratko (Computer analysis of World Chess Champions, ICGA journal, 29-2, 2006) did remarkable and pioneering work, advocating for the idea of ranking players by analyzing with a computer program the moves made and by trying to assess the quality of these moves. However, their work was criticized on different grounds; they used a chess program that in 2006 had an ELO rating of only 2700 and moreover, they lacked computing power and used this chess program at a limited depth, and the sample analyzed was small. But the main criticism is more fundamental: how can you say which is the best player between a player who is playing the exact right move most of the time, but sometimes makes serious blunders and a player who makes good moves (but not the best moves) all the time and almost never blunders?

In 2012 Diogo Ferreira (Determining the strength of chess players based on actual play, ICGA journal, 35-1, 2012) refined the idea. He computed the difference between the evaluation of the move played and the evaluation of the best move found by the computer and interpreted this difference as a distribution function. By computing the convolution of the distribution of two different players, he was able to compute an expected value of the result of the game between the two players. However, his work suffered also from a lack of computing power, and presented probably a small methodological inaccuracy regarding the interpretation of the result of the convolution of distributions. But there remained a main central difficulty, which is the problem of context: a small error has almost no significance when made in a position that is already seriously unbalanced in favor of one of the two players, while it might be a killing move in a tight game; this was a problem that Ferreira's method could not address.

The article (available below), published in the ICGA journal (ICGA Journal, 39-1, 2017) extensively reviews the ranking methods, explains their strengths and their weaknesses, and evaluates them on a very large corpus of games: 26000 games (all games played by World Champions from Wilhelm Steinitz to Marcus Carlsen), evaluated by the best available program (Stockfish that, with the setting used, has an ELO strength of around 3100 or 3200 ELO points), at regular tournament time controls (62000 CPU hours were needed on the OSIRIM cluster of the Institut de Recherche en Informatique de Toulouse).
It also demonstrates that, by still using a computer program to evaluate moves, all the problems mentioned above can be solved by interpreting chess as a Markovian process. It is thus possible using this last method and some linear algebra, to have a ranking that is more reliable and can compare players through the years (there are also some other interesting points that arise from the statistical analysis performed on a large database of chess games. For example, it appears that chess players are performing better when playing with white pieces than when playing with black pieces, probably for psychological reason; playing with black probably encourages more risk taking, and thus more mistakes, as it is usually assumed that black has a small disadvantage in chess).

The question usually asked at this point by most people is: "Then, who is/was the best?" Well, as for simplest questions, there is no trivial answer. Distribution or Markovian methods do not provide rankings, they just provide a way to rank a pair of players. However, a simple (and partial) answer is provided in the following table, which is extracted from the article. Each cell is the percentage of the expected result of a game between the two corresponding players, taken in their best year(Carlsen: 2013, Kramnik: 1999, Fischer: 1971, Kasparov: 2001, Anand: 2008, Khalifman: 2010, Smyslov: 1983, Petrosian: 1962, Karpov: 1988, Kasimdzhanov: 2011, Botvinnik: 1945, Ponomariov: 2011, Lasker: 1907, Spassky: 1970, Topalov: 2008, Capablanca: 1928, Euwe: 1941, Tal: 1981, Alekhine: 1922, Steinitz: 1894.). The table is not symmetric as it is not the same thing to play first or to play second. The left column is more or less a ranking of the 20 World Chess Champions. However, to understand all the ins and outs of the method, you should read the complete article.

Carlsen 52545457585758566061596061616466697082
Kramnik49 525255565657555960586060606365687083
Fischer4749 5153575657565960606161626468707385
Kasparov474950 53545454535758565658586062666882
Anand44464848 545253535756575759596264697186
Khalifman4345444747 5051525354555556566062646779
Smyslov434545474951 50515355545454555963646882
Petrosian43444547495051 525354545555565963636780
Karpov4446454848495049 5152525252525658606376
Kasimdzhanov414342454548484850 52525254535660626580
Botvinnik40414144454846484949 505452525660606480
Ponomariov4243414544474747494951 5152525558596277
Lasker414140454446474649494850 51505458596378
Spassky40414043424547464847494950 515358576175
Topalov4041394442454645494849495051 5457576175
Capablanca373837413942424245454547474847 53545976
Tal35363439373939384341414343434448 495472
Euwe3233323632373738413941424344444752 5675
Alekhine313129343035333538363739384040434745 69
Table 9: Head to head match result predictions between different World Champions in their best year

This method can be used for any two players game as long as an "oracle" (i.e. a computer program strong enough to evaluate moves reliably) is available. This includes Checkers, Reversi, Backgammon, and probably soon Go.

The complete draft of the article is available here in pdf. It can also be read online here and there is also an epub format a mobi format and an azw3 format for reading on various devices. The pdf draft is almost completely identical to the final article published in the ICGA journal, except for the page layout and some very minor modifications. Other formats might be less readable because of the conversions of mathematical formulas, but the text is also identical to the original paper.
I want to thank again Jaap Van Den Herik, who was the main editor of this article, and is now the honorary editor of the journal, not only for his (numerous) corrections but especially for publishing the full article, without any cuts or reduction, despite its length.
I also want to thank all the referees who worked on the article. They greatly helped in enhancing the paper, with a process of corrections/modifications that lasted for almost a year. They have chosen to remain anonymous, but I owe them. The original article can be consulted and ordered from IOS Press website.

There was also a press release, an article in the CNRS journal and an article on the chessbase website about this work. In french there were also articles in various mainstream media: l'Express, 20 minutes, la Dépêche, le Figaro.

This article, as any scientific article, needs to be read, commented, criticized and verified, even if it has been published in a peer reviewed journal. The full PGN database of evaluated games can be downloaded here. Thus, anybody can download it, and check all the results presented in the paper.

The exact reference of the article is:

	author = 	 {Jean-Marc Alliot},
	title = 	 {Who is the master?},
	journal = 	 {ICGA Journal},
	year = 	 {2017},
	volume =	 {39},
	number =	 {1},
	OPTpages =	 {},
	OPTmonth = 	 {},
	note = 	 {DOI 10.3233/ICG-160012}

Photo by Bundesarchiv, Bild 183-76052-0335 / Kohls, Ulrich / CC-BY-SA 3.0, CC BY-SA 3.0 de, https://commons.wikimedia.org/w/index.php?curid=5665206

Back to my homepage

The download and use of documents or photographies from this site is allowed only if their provenance is explicitly stated , and if they are only used for non profit, educational or research activities.
All rights reserved.

Last modification: 10:20, 12/16/2020