Some thoughts on anti-cheating systems

Roger Lancaster · Post by **Roger Lancaster** » Fri May 29, 2020 1:39 pm

I had, perhaps rather naively, assumed that everyone realised that - despite the occasional claims to the contrary - the calculations which underpin the various anti-cheating mechanisms result in probabilities and not certainties. It's a long time since I studied probability and statistics, so I'm far from sure that I'm the best-equipped person to do this, but here's my take on the subject.    Nevertheless, this is a quick run-through.    I should add that I was unable to attend Tuesday [26 May] evening's webinar and so haven't had the benefit of hearing from the speakers there.

Anti-cheating mechanisms claim to use a variety of methods to detect online computer cheats including 'window switching' and considering the respective lengths of time taken to make obvious, as opposed to not-so-obvious, moves where it might be expected that an unassisted human would routinely make the former faster than the latter. However, what follows will attempt to disregard these and crystallise my thinking [and I ask readers to forgive any errors or omissions] on mathematical aspects.

The mathematical aspects involve stipulating a hypothesis - possibly, "This player never uses computer assistance" - and testing it against data, in this case, chess moves played by the player in question, to determine a probability that the hypothesis is true.

This type of test is convenient for binary outcomes where the probability of occurrence is constant throughout. If one wishes to test the authenticity of a coin [hypothesis, this coin is equally likely to land as heads or tails] by tossing it 20 times, the 'expected' result is 10 heads and 10 tails. In practice this is unlikely to happen but, if the result were 12 heads and 8 tails, this would probably not concern us overmuch. But, if the result were 18 heads and 2 tails, even a non-mathematician would conclude the hypothesis was unlikely to be true.

In fact, it's not hard to see that the total number of possible outcomes is 2 to the power of 20 or a little more than a million. Just one of those outcomes results in no tails at all, 20 in just one tail, and 380 in just 2 tails. So the probability of as few as 2 tails from 20 tosses is roughly 400 in 1,000,000 or 0.04 per cent.

Those familiar with high school statistics will know that the standard deviation of a distribution is the positive square root of the variance which, in turn, can be represented by the term Sigma [Fx - F] squared / N, where Fx represents actual values, F the expected value and N the number of events. [This term would have been clearer with mathematical symbols which my keyboard does not possess]. In coin-toss examples such as that above , the range of outcomes can be said to represent a 'normal distribution', which has certain well-known properties.

In the case of a normal distribution, using the symbol S to denote the value of the standard deviation, some 95.5% of events are located between -2S and +2S while about 99.75% lie between -3S and +3S. In other words, in the more stringent test, something like 0.12% of events are outliers with values higher than +3S while a similar number have values lower than -3S. Therefore, if one takes 3S as the critical test, it is implicit that two instances in every thousand [only one of which is relevant in the present context] will be wrongly judged.

I will spare more detailed calculations but the coin-toss experiment relies on two assumptions. First, that the probability of each event - in this case, the toss of a coin - is constant and, second, that one can rationally derive an 'expected' result. Given those two characteristics, the range of outcomes should conform to the 'normal distribution'.    Where the distribution is skewed, the rules get a lot more complicated.

Before I move on to the processing of statistics, I'd like first to deal with how data is obtained because there appears to be a separate set of issues surrounding data selection, notably confirmation bias. Suppose someone plays in ten 7-round tournaments and, in one of these, his performance is significant at the 3S level. If it can be shown that, over the entire sequence of 70 games, his performance is insignificant even at the 1S level, what then? I've heard it argued that "he doesn't normally cheat but he cheated on that particular occasion because ..." followed by the speaker's theory. The problem here is that the mathematics of the 7 games and those of the 70 are contradictory but there is no imperative to believe the former and discount the latter. In a criminal trial, it's entirely legitimate to highlight a defendant's previous unblemished character. The prosecution may argue that's neither here nor there but it's a factor a judge would normally ask a jury to bear in mind.

That's not a merely theoretical consideration. One can envisage a situation where an organisation such as the English Chess Federation submits, in good faith, an online tournament file to Ken Regan's software. This determines that, based on the 7 games he played there, it is highly probable that player X [it has no separate information on this player except, perhaps, his rating] had external assistance. The software has no access to the other 63 games which, if it had been able to take them into account, might have led it to a different conclusion.

Reverting to the central issue, when it comes to detecting chess cheats, we need to start with a hypothesis.    This needs to be unambiguous so "This player never uses computer assistance" meets the test. To test this hypothesis that "This player never uses computer assistance" we then require to test the moves played and, if the quality of these moves over a sufficiently long period is sufficiently in excess of their expected quality, one has to search for a reason. It's important to bear in mind that the reason may be that this is a statistical outlier - in the coin-toss example earlier, even an outcome of 18 heads indicated only a 99.6% probability of a biased coin.

The cheating problem is more complicated than the coin-toss example for a number of reasons including [and this is not exhaustive]

1] Even this apparently clear hypothesis presents problems. Suppose two friends agree to a series of games where each of them will have computer assistance. Neither is cheating the other and, even if the games are rated, it's far from clear that they are cheating anyone else because there should be a zero-sum effect on ratings. I don't suggest this is particularly sensible but it's something which two innocents, with no thought of cheating or anti-cheating mechanisms, might conceivably do. One can think of other examples, for example, a young child who doesn't understand that this is cheating until an adult tells him or her.

2] There will be some positions where there is disagreement over the evaluation of moves - for example, in a given position, Stockfish's first choice might not coincide with that of AlphaZero

3] In any case, it's no longer a binary issue between 'right' and 'wrong' moves. Suppose White has just three legal moves and the selected computer program evaluates these at +2.0, +1.0 and -2.0 respectively, it should clear that the second move is not 'wrong' in the same sense as the third. Instead of the normal p=1, q=0, approach, the example just given presumably requires us to adopt the approach along the lines of a=1.0, b=0.75, c=0.

4] A judgment has to be made over what is 'quality'. First inclination would be to take as comparator the quality of play of someone's peers, in other words, in the case of an 2300-rated player, that of other 2300-rated players. A likely problem here is that different players have different styles - at key moments, some favour objectively accurate moves while others favour moves which, although objectively less accurate, are thought to create greater problems for human opponents.

5] Point 4 isn't necessarily a problem with the 'expected quality' against which the player's moves are being assessed, because widely differing results will simply result in a greater variance about the mean. However, if the subsequent calculations take for granted a normal distribution, then this assumption appears to rely on the number of games played by the 'objectively accurate' group being comparable to that played by the second group which is more reliant on psychological factors.

6] It appears to be well-established that the quality of play for most strong players drops off by around 200 points if they find themselves in serious time trouble, most commonly but not exclusively just before the time control at move 40. Not unexpectedly, the probability of someone finding the 'right' move, however defined, is correlated with the amount of time at his or her disposal.

7] There are special problems in the openings, where a player might be playing moves which were the result of [entirely legal] computer-assisted research before the game, and in many endings. There are some endings [clear examples might be queen versus rook, rook and bishop versus rook, and even the relatively trivial bishop-and-knight mate] where someone is more likely to play accurate moves if he or she has had previous experience of that ending and is therefore familiar with the best techniques.

I do not suggest that these difficulties are insuperable but that they mean it is no longer self-evident that a normal distribution applies - and that's a key point. Once there's a departure from the normal distribution, the relationships mentioned above [for example, that some 99.75% of events occur in the range from -3S to +3S] no longer hold good. Of course, it's entirely possible that any departure is small [for example, the factors listed might have a tendency to cancel one another out] and can be disregarded for practical purposes but it's equally possible that it isn't, in which case those relationships definitely won't apply. My impression is that Ken Regan has investigated points such as these and attempted to make compensating adjustments. I don't detect any evidence of similar rigour - although I may yet be proved wrong - by the likes of Lichess or Chess.com.

All this means that it's no longer analogous to a coin-toss situation and in some respects [although I wouldn't want to take the analogy too far] it's more akin to the practices of entirely respectable polling organisations when deciding how to select a random sample from the general population. Personally, I don't for one moment doubt the integrity of Professor Regan, nor do I doubt that he has taken a scholarly approach to this. Whether, in 10 years time, it will still be regarded as the Holy Grail strikes me as rather less certain.

When it comes to the issue of transparency, Ken Regan has clearly been more open than Lichess or Chess.com. But, and I hope he will forgive me for pointing this out, this is because he can afford to be transparent up to a certain point without undermining his system through enabling cheats to find their way around it. Few people know, and I'd guess that only a handful understand, the nature of the adjustments he has had to make to compensate for the types of factor listed here. In effect, the transparency is only superficial and, since most of us aren't mathematical geniuses, that seems unavoidable. In short, Ken Regan can lift the first veil for the simple reason that there's a second veil underneath.

Lichess and Chess.com have each claimed that they can't be transparent because that would undermine their anti-cheating systems. It is, of course, not hard to think of other possible reasons. However - taking the platforms' explanation at face value - one is still left to ask, if Ken Regan can be transparent, why can't they? And the most likely answer is that their systems are more basic and lack the checks and balances needed [and the presence of which creates the 'second veil'] to get to a comparable level of reliability.

And, to be fair, there is no reason why they should. Professor Regan's work into chess cheats has been commissioned by FIDE over many years and there has been every incentive to perfect it. That's doubly the case if the chess algorithm is, in effect, a prototype for that to be used [as I think is his intention] for other activities. By comparison, although online chess has been around for a long time, few people took it really seriously until the recent pandemic meant it was the only game in town. For non-serious chess, something which worked mostly well for most people most of the time was probably adequate. Once online chess became serious, 'adequate' took on a new meaning - and, since there wasn't advance notice of the pandemic, the platforms haven't had enough time to up their game.

All this assumes, of course, that the various systems are fit for purpose. A platform could have a very simple anti-cheating policy that someone was barred if they were the subject of three or more complaints within a 14-day period.    This would, no doubt, be effective in catching many cheats but the downside would be obvious.    I give this as an extreme example of an "unfit for purpose" policy and do not suggest that it resembles any actual policy. There's little objective basis for judging fitness for purpose here but, in view of the opaqueness associated with the Lichess and Chess.com systems, I would need considerable persuasion to have anything approaching 100% confidence in them.

Roger de Coverly · Post by **Roger de Coverly** » Fri May 29, 2020 1:56 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 1:39 pm

Reverting to the central issue, when it comes to detecting chess cheats, we need to start with a hypothesis. This needs to be unambiguous so "This player never uses computer assistance" meets the test.

That isn't true for almost any contemporary player, something you allude to in point 7. If you analyse with an engine after the game, or read a book where the supporting analysis is computer verified, you are using an engine. Over the board rules don't permit this during a game and never have, but "Correspondence" rules do.

The major problem with on line chess is that without continuous sight of the opponent and the general surrounding environment it's impossible to know whether they are seeking external assistance or not. That's fine if you can rely on their good faith, or perhaps it's just casual Blitz or Bullet at lower rating levels

MartinCarpenter · Post by **MartinCarpenter** » Fri May 29, 2020 2:05 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 1:39 pm
When it comes to the issue of transparency, Ken Regan has clearly been more open than Lichess or Chess.com. But, and I hope he will forgive me for pointing this out, this is because he can afford to be transparent up to a certain point without undermining his system through enabling cheats to find their way around it.

A lot of sensible points.

This one I do think is broadly nonsense here though. I simply cannot see how a remotely casual cheat could dodge the move statistic imprints that Regan et al look for.

They're accepting suggestions from an engine. That's what generates the detectable anomolies in their play.

The only real counter measure would be to stop doing so, ie not cheating.

Roger Lancaster · Post by **Roger Lancaster** » Fri May 29, 2020 2:08 pm

Roger de Coverly wrote: ↑
Fri May 29, 2020 1:56 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 1:39 pm

Reverting to the central issue, when it comes to detecting chess cheats, we need to start with a hypothesis. This needs to be unambiguous so "This player never uses computer assistance" meets the test.
That isn't true for almost any contemporary player, something you allude to in point 7. If you analyse with an engine after the game, or read a book where the supporting analysis is computer verified, you are using an engine. Over the board rules don't permit this during a game and never have, but "Correspondence" rules do.

The major problem with on line chess is that without continuous sight of the opponent and the general surrounding environment it's impossible to know whether they are seeking external assistance or not. That's fine if you can rely on their good faith, or perhaps it's just casual Blitz or Bullet at lower rating levels

At point 1 also, I say "Even this apparently clear hypothesis presents problems" and, in as much as statistics entail the testing of a hypothesis to determine a probability that the hypothesis is true, there's a clear issue here over finding a 100% satisfactory hypothesis.

Roger de Coverly · Post by **Roger de Coverly** » Fri May 29, 2020 2:11 pm

MartinCarpenter wrote: ↑
Fri May 29, 2020 2:05 pm
I simply cannot see how a remotely casual cheat could dodge the move statistic imprints that Regan et al look for.

They're accepting suggestions from an engine. That's what generates the detectable anomolies in their play.

That depends how good a player they are. Good players find good moves, so do engines. The moves match, but so what?

Even players with lowish ratings can find good moves. As a general hypothesis, higher rated players play well in all positions. Lower rated players may only play well in some positions, very possibly that's why they are lower rated.

Joey Stewart · Post by **Joey Stewart** » Fri May 29, 2020 2:40 pm

Roger makes a good point that players using any modern theory ARE using engine moves for a good portion of the early game - you can usually tell these guys, as they often trash you in the opening, get a good position but fortunately have put no effort into their studies beyond parrot fashion memorisation and can be undone in the late game. As much as they can be annoying, it is still satisfying to prove to them that all the book learning in the world is no substitute for actual thought.

There are those that are just blatant, idiots who have little to no talent to even bother to learn the aforementioned theory and will let the engine play near every move for them, these tend to be the ones that get weeded out by most anti cheating measures.

And lastly there are the filthy snakes that are ok players but run background engines in case they get into difficulty and want to bail out - definitely the worst of the worst as they are good enough to know better, and the centaur approach makes it much harder to match their play up to that of a pure machine and also mean that serious internet competition will never be entirely credible.

Matthew Turner · Post by **Matthew Turner** » Fri May 29, 2020 2:56 pm

Roger,
FIDE are looking for a performance that is five standard deviations away from the expected result to convict a player based on purely statistical evidence. That equates to roughly a one in 1.5 million chance that the player played the games naturally without assistance. I would suggest that Lichess and Chess.com are seeking to align their policies with those of FIDE. So, I think your analysis is basically sounds, but you are some way off with the level of evidence that is required to ‘convict’ a player.

Roger Lancaster · Post by **Roger Lancaster** » Fri May 29, 2020 3:05 pm

Matthew Turner wrote: ↑
Fri May 29, 2020 2:56 pm
Roger,
FIDE are looking for a performance that is five standard deviations away from the expected result to convict a player based on purely statistical evidence. That equates to roughly a one in 1.5 million chance that the player played the games naturally without assistance. I would suggest that Lichess and Chess.com are seeking to align their policies with those of FIDE. So, I think your analysis is basically sounds, but you are some way off with the level of evidence that is required to ‘convict’ a player.

Matt, your figure is correct on the basis of a normal distribution but I find myself in some difficulty knowing, for the reasons explained in my original post, whether a normal distribution applies. I suspect the actual distribution is fairly close, and very possibly close enough that a normal distribution approximates for all practical purposes, but I'd be the first to admit that that's little more than a guess on my part. [I take it, incidentally, that Lichess and Chess.com aren't believed to be currently on the 5S basis].

David Sedgwick · Post by **David Sedgwick** » Fri May 29, 2020 3:08 pm

Matthew Turner wrote: ↑
Fri May 29, 2020 2:56 pm
I would suggest that Lichess and Chess.com are seeking to align their policies with those of FIDE.

Matthew, do you have any evidence for that, from the two organisations or anywhere else?

Matthew Turner · Post by **Matthew Turner** » Fri May 29, 2020 3:46 pm

David,
Lichess and Chess.com are meeting with a FIDE every week to discuss their anti-cheating policies and their seems to be a desire for a consistent approach. Partly this is to do with credibility, obviously the providers have more credibility with top players if their anti-cheating policies are aligned with those we see over the board.

Matthew Turner · Post by **Matthew Turner** » Fri May 29, 2020 3:57 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 3:05 pm
[I take it, incidentally, that Lichess and Chess.com aren't believed to be currently on the 5S basis].

I don’t know, but remember FIDE would would pretty much never end up using this figure, because an organiser would find collaborating evidence before the Z score reached this level (a mobile phone in a bag or earpiece or whatever). Lichess and chess.com have parallels to this physical evidence, so perhaps a ban could be imposed at a lower level, but in the absence of this evidence, it looks like the providers are working on a similar level of statistical evidence to FIDE.

David Sedgwick · Post by **David Sedgwick** » Fri May 29, 2020 5:14 pm

Matthew Turner wrote: ↑
Fri May 29, 2020 3:46 pm
David,
Lichess and Chess.com are meeting with a FIDE every week to discuss their anti-cheating policies and their seems to be a desire for a consistent approach. Partly this is to do with credibility, obviously the providers have more credibility with top players if their anti-cheating policies are aligned with those we see over the board.

Thanks. If Lichess and Chess.com were no longer to insist on being a law unto themselves, that would indeed be a positive development.

Matthew Turner · Post by **Matthew Turner** » Fri May 29, 2020 6:14 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 3:05 pm
Matt, your figure is correct on the basis of a normal distribution but I find myself in some difficulty knowing, for the reasons explained in my original post, whether a normal distribution applies.

So it is an accepted that you can approximate a polynomial distribution to a normal Distribution - every mathematician would agree with that. So, perhaps the question if whether we have a polynomial distribution to start off with. The answer is that we don’t exactly, because chess moves are not independent as coin tosses would be. So then we have to answer how many chunks of data do we require to overcome this. This has all been researched by Ken Regan and been back tested by using historic data sets. I think you can rest assured that people are only going to make decisions when there is enough data to overcome this potential problem.

Roger de Coverly · Post by **Roger de Coverly** » Fri May 29, 2020 6:35 pm

Matthew Turner wrote: ↑
Fri May 29, 2020 6:14 pm
So then we have to answer how many chunks of data do we require to overcome this. This has all been researched by Ken Regan and been back tested by using historic data sets. I think you can rest assured that people are only going to make decisions when there is enough data to overcome this potential problem.

But what are you trying to establish? If you can "prove" that moves were established with the use of a chess engine, why does this help, particularly in the over the board context? There is surely a reasonable doubt that the moves had been pre-analysed before the game and remembered.

FIDE have only proceeded where there was also external evidence, such as mobile phones in the rest room or complex methods of signalling moves.

Roger Lancaster · Post by **Roger Lancaster** » Fri May 29, 2020 6:42 pm

Matthew Turner wrote: ↑
Fri May 29, 2020 6:14 pm

Roger Lancaster wrote: ↑
Fri May 29, 2020 3:05 pm
Matt, your figure is correct on the basis of a normal distribution but I find myself in some difficulty knowing, for the reasons explained in my original post, whether a normal distribution applies.
So it is an accepted that you can approximate a polynomial distribution to a normal Distribution - every mathematician would agree with that. So, perhaps the question if whether we have a polynomial distribution to start off with. The answer is that we don’t exactly, because chess moves are not independent as coin tosses would be. So then we have to answer how many chunks of data do we require to overcome this. This has all been researched by Ken Regan and been back tested by using historic data sets. I think you can rest assured that people are only going to make decisions when there is enough data to overcome this potential problem.

Yes, that was my surmise too - that, although we can't be sure it approximates to a normal distribution, Ken Regan had used historic data to make adjustments which [hopefully] compensated for this. What I wondered aloud in my original post was whether Lichess and Chess.com also had sufficient back data [and Ken's appears to go back a long way!] to make similar adjustments in their models. If not, which seemed very likely although I dare say they will now dispute this in order to maintain credibility, then there's no similar compensation.

English Chess Forum

Some thoughts on anti-cheating systems

Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems