GRADING ANOMALIES

E Michael White · Post by **E Michael White** » Wed Dec 16, 2009 2:26 pm

Brian Valentine wrote:I think we agree that all the grades in your match are finally based on linear combinations of the starting grades of Y&Z.

Only as far as the grade related term is concerned, the performance sum is independent of grade and bias. I think there is still an unexplained difference between your and my formula. I believe that the bias for b anc c should be identical as they played the same opponents but with different results. Your approach shows that the bias for a and d is the same but b and c different.

My process stopped after the iteration step for new starters because I was, and still am unsure in the post 2008 basis as to how junior grades affect the grades of the already graded when they are fedback. I havent seen the new basis in print yet. If you ignore the 40 point rule on the pre 2009 basis, feeding the new starter grades back would leave their grades unchanged after one further round of calculations, as the opponents are the same and equilibrium was reached.

I agree with you, pre 2009, that the feedback can change inappropriately the bias for the already graded as they are out of phase and effectively hit twice by bias. In some fortuitous cases this might reduce the bias but in others it will increase it.

Robert Jurjevic · Post by **Robert Jurjevic** » Wed Dec 16, 2009 2:35 pm

Hello Roger,

GS is an 'ka + kb = 2' grading system.
AGS3 is an 'ka + kb = 1' grading system.

There are total of 11,701 ECF players with standard grades for 2009 and 2008.

A total absolute GS grade correction for the 11,701 players between 2008 and 2009 is 52,401 grading points, which is 5.16 grading points on average per player.
A total absolute AGS3 grade correction for the 11,701 players between 2008 and 2009 would be 26,200.5 grading points, which would be 2.58 grading points on average per player.

Please note that for most 'ka + kb = 1' grading systems a total absolute grade correction for the 11,701 players between 2008 and 2009 would be approximately 26,200.5 grading points, which would be approximately 2.58 grading points on average per player.

If one should use 'ka + kb = 1' rather than 'ka + kb = 2' grading system then GS would over-correct grades for approximately 2.58 grading points on average per player per year, which, if the effect is cumulative, would for 10 years result in grade over-correction of approximately 25.81 grading points on average per player.

Please note that AGS3 is not the only 'ka + kb = 1' grading system and that better ones could be advised, in particular with regard to grade lag (an example follows): Let us assume that all ECF players have a grade of 120 except one who has a grade of 100 and that every player played 30 games in 2008/09 season and that all games were drawn (we also assume that no two players played twice). Then new GS grade for 2009 for the 100 player would be 120 and a new GS grade for each player who played the 100 player would be (29*120 + 1*100)/30 = 119.33 grading points. New AGS3 grade for 2009 for the 100 player would be 110 and a new AGS3 grade for each player who played the 100 player would be (29*120 + 1*110)/30 = 119.66 grading points. A total GS grade correction for the 100 player and his opposition would be 120 - 100 + (120 - (29*120 + 1*100)/30)*30 = 40 grading points. A total AGS3 grade correction for the 100 player and his opposition would be 110 - 100 + (120 - (29*120 + 1*110)/30)*30 = 20 grading points. So, GS in respect to AGS3 over-corrected grades for 20 grading points in total or for 0.65 grading points on average per player. My objection is not that one cannot assign a grade of 120 to the 100 player in 2009, but that if one does that, that then a new grade of each player who played the 100 player should be (29*120 + 1*120)/30 = 120 grading points, which gives a total grade correction for the 100 player and his opposition of 120 - 100 + (120 - (29*120 + 1*120)/30)*30 = 20 grading points.

IMHO it is yet unclear if one should switch from 'ka + kb = 2' to 'ka + kb = 1' grading system, but if one should, then the error, assuming it is a cumulative one, cannot be neglected.

Nevertheless, switching from linear to logistic relationship between expected performance 'p' and grade difference 'd' would definitely improve grade calculation accuracy when grading games where grading difference between opponents 'd' is greater than approximately 30 grading points.

Kind regards,

Roger de Coverly · Post by **Roger de Coverly** » Wed Dec 16, 2009 2:53 pm

Robert Jurjevic wrote:There are total of 11,701 ECF players with standard grades for 2009 and 2008.

I expect I could reproduce this.

Robert Jurjevic wrote: A total absolute GS grade correction for the 11,701 players between 2008 and 2009 is 52,401 grading points, which is 5.16 grading points on average per player.

Could you explain this please and show how it's calculated?

How do you deal with the issue as identified by EMW and BJV that all grades are wrong in that they contain at least two components - namely "estimate of strength" plus a "bias or error".

Brian Valentine · Post by **Brian Valentine** » Wed Dec 16, 2009 3:10 pm

I think the 11701 is the number of people on the 2009 v3 rating list. There are various problems with counting those with grades in 2008 and 2009. One can count from either list trying to match either those who played or those who had a grade published. There are a few other minor discerpancies.

The number appearing with a grade in both lists is around 8660. On these players a make the difference much more neutral than you Robert (again depending on the actual selection of valid records), my selection gives a net change of less than 3000.

Robert Jurjevic · Post by **Robert Jurjevic** » Wed Dec 16, 2009 3:22 pm

Hello Brian,

Brian Valentine wrote:The number appearing with a grade in both lists is around 8660. On these players a make the difference much more neutral than you Robert (again depending on the actual selection of valid records), my selection gives a net change of less than 3000.

Whatever the numerical value for the total grade change, the one of 'ka + kb = 1' grading system is approximately half of that of 'ka + kb = 2' grading system, so if one should use 'ka + kb = 1' rather than 'ka + kb = 2' grading system, assuming that the error is cumulative, the difference may become large enough that it could not be neglected.

Kind regards,

Brian Valentine · Post by **Brian Valentine** » Wed Dec 16, 2009 3:49 pm

Robert,
I have been admiring the lucidity of Roger's responses to each of your points and have no intention of duplicating his heroic efforts.

However I am saying that if you look at the statistics correctly and even if you are right, the point is trivial compared with: joiners, leavers, juniors, overseas players, other community network effects, uneven black and white expectations, the perception of stretch in the first place and the graders solution to perceived stretch to name the most obvious

Robert Jurjevic · Post by **Robert Jurjevic** » Wed Dec 16, 2009 3:59 pm

Brian Valentine wrote:However I am saying that if you look at the statistics correctly and even if you are right, the point is trivial compared with: joiners, leavers, juniors, overseas players, other community network effects, uneven black and white expectations, the perception of stretch in the first place and the graders solution to perceived stretch to name the most obvious

Well, I was 90ish player suddenly becoming 120ish, is this a minor change which one should neglect?

If you are certain that 'ka + kb' has no effect on grading anomalies I hope that you agree at least that... switching from linear to logistic relationship between expected performance 'p' and grade difference 'd' would definitely improve grade calculation accuracy when grading games where grading difference between opponents 'd' is greater than approximately 30 grading points.

Please note that I have got even 'worse' result if I take into account only the players who played 30 or more games in 2008/09 season...

There are total of 1,976 ECF players with standard grades for 2009 and 2008 who played 30 or more games in 2008/09 season.

A total absolute GS grade correction for the 1,976 players between 2008 and 2009 is 13,942 grading points, which is 7.06 grading points on average per player.
A total absolute AGS3 grade correction for the 1,976 players between 2008 and 2009 would be 6,971 grading points, which would be 3.53 grading points on average per player.

If one should use 'ka + kb = 1' rather than 'ka + kb = 2' grading system then GS would over-correct grades for approximately 3.53 grading points on average per player per year, which, if the effect is cumulative, would for 10 years result in grade over-correction of approximately 35.3 grading points on average per player.

!
May I ask if anybody knows what is the average grade correction per player (between old and new grades)? If it is around 15 grading points then my estimate is that having 'ka + kb = 2' instead of 'ka + kb = 1' might have caused it.

Thanks.

Brian Valentine · Post by **Brian Valentine** » Wed Dec 16, 2009 5:46 pm

Robert,
Now you are being selective. Your old grade on the new basis was 122 not 90ish.

I do have the figures but can't upload them easily (I've tried!). The big difficulty here is getting an unbised sample. In general people who play more games tend to improve their gradings and those who lower activity have falling grades. Hence I would contend that all 2008/9 category A is a biased sample - it includes some improvers but excludes some decliners.

I can give some examples (noting my counting method is different from you - I only have 1972 category A 2008/9 players). Counting all survivors (i.e those identified as having a grade in both years) the change is 0.3 points on average. Those 2009 category A improved 2.2 points on average. The 2008 Category A players in this sample improved only 0.9 points. The common A/A group improved 1.6 points. All other 2008 cat A / "2009 category not-A" fell except those moving to E (which is based on a sample of only 3).

I don't agree that moving from linear to logistic would improve things - I am agnostic on this. Professor Elo, in section 1.8 in his book, suggests the relationship is roughly linear upto the equivalent of about 40 ecf rating points (300 elo). I have not seen statistics to show what should happen above a 40 point difference to know what the curve should be. I do note that the ECF approach is much simpler to administer. What I do know is that when having the priviledge of playing Stuart Conquest last week I was expected to score .11 on FIDE, but only 0.1 on ECF, both are optimistic.

The advantages of the ELO system are in its use of k factors not in the shape of the tail.

Roger de Coverly · Post by **Roger de Coverly** » Wed Dec 16, 2009 6:12 pm

Robert Jurjevic wrote:Well, I was 90ish player suddenly becoming 120ish, is this a minor change which one should neglect?

In case you hadn't noticed, they totally rebased the grading system by moving the ends towards the middle.The top was left unchanged but the middle gained 20 to 30 free points.

On this page, they show your 2008 grade as 122 (against 121 for 2009)
http://grading.bcfservices.org.uk/getpl ... layers=jur

On this page, they show your 2008 grade as 90
http://grading.bcfservices.org.uk/getre ... ef=253130G

If they had not tampered with the system you would still have been 89 or 90 in 2009 ( but they never published this).

The approach of looking at a grade as the sum of measure of strength and a measure of error/bias/improvement is valuable idea.

So a player who is 30 points undergraded will be corrected over the season at the possible cost of introducing an error of 1 in each of his opponents. It's a philosophy that says that when you want to rank players by strength, it's least worse to have lots of small errors rather than one large one.

The error term gives a very clear exposition about why the ECF system can work badly for lighthouse keepers (any system would really) because the error correction is over one player only instead of 30.

It's a valid measure to look at the sum of the absolute values of grade change. In isolation it doesn't tell you much because it's the relative values that determine the new grade. You could monitor it from one year to the next. By absolute change I mean that in a two player set where the grades change 140 -> 150 and 160 -> 150, the absolute change is 20 but the relative change zero. I would expect the higher values of absolute grade change in future because of the instability introduced by the junior changes.

Roger de Coverly · Post by **Roger de Coverly** » Wed Dec 16, 2009 6:35 pm

Brian Valentine wrote:Robert,
. What I do know is that when having the priviledge of playing Stuart Conquest last week I was expected to score .11 on FIDE, but only 0.1 on ECF, both are optimistic.

FIDE used to use a 350 cutoff (which gives 0.11). Alongside the change to bimonthly, they changed it to a +400 rule, so the expected score drops to 0.08.

Brian Valentine · Post by **Brian Valentine** » Wed Dec 16, 2009 7:43 pm

Thanks Roger that was my guess. I was just using the London Classic website which uses 0.11 it its calculations. I then mused about all this fuss about the tail shape!

Roger de Coverly · Post by **Roger de Coverly** » Wed Dec 16, 2009 8:32 pm

Brian Valentine wrote: I then mused about all this fuss about the tail shape!

From the FIDE ratings regs ( 320 FIDE = 40 ECF)

FIDE diff expected score

316-328 0.13
329-344 0.12
345-357 0.11
358-374 0.10
375-391 0.09
392-411 0.08

I've sometimes wondered what the practical effects would be of using a 45 or 48 point rule would be in the ECF system. It would reduce the "reward" to the top players of games in early rounds of Opens and in playing for strong teams against weak ones. So should it narrow the distance between top and middle? Against that, players who got the "wrong" result would get a higher revaluation.

Brian Valentine · Post by **Brian Valentine** » Wed Dec 16, 2009 9:28 pm

I've sometimes wondered what the practical effects would be of using a 45 or 48 point rule would be in the ECF system. It would reduce the "reward" to the top players of games in early rounds of Opens and in playing for strong teams against weak ones.

I don't like using the word "reward", I don't think its the right concept; leaving this aside I agree. However one then gets into the area of material perceived stretch. I've always considered the 40 point cut off - relative to say 45- compensating for this situation.

The only way to resolve all this is look at the ECF database records of all games to see how often the cut off is effected at various levels and what the actual odds are as the tail difference increases. A cursory glance at round 1 of the LCC FIDE Open for example, gives a score of about 109-5 for the stronger players at something near these differences. So it looks as though 48 might be too much.

Richard Bates · Post by **Richard Bates** » Thu Dec 17, 2009 6:33 am

Brian Valentine wrote:
A cursory glance at round 1 of the LCC FIDE Open for example, gives a score of about 109-5 for the stronger players at something near these differences. So it looks as though 48 might be too much.

A cursory glance is probably appropriate, since for some reason the organisers seemed to have chosen to give some players their rapid play grade, some their standard play grade, and some had their grade just plucked out of thin air! Debates about grading are all rather academic in those circumstances

Robert Jurjevic · Post by **Robert Jurjevic** » Thu Dec 17, 2009 11:29 am

Hello all,

Roger de Coverly wrote:
Robert Jurjevic wrote:There are total of 11,701 ECF players with standard grades for 2009 and 2008.
I expect I could reproduce this.
Robert Jurjevic wrote: A total absolute GS grade correction for the 11,701 players between 2008 and 2009 is 52,401 grading points, which is 5.16 grading points on average per player.
Could you explain this please and show how it's calculated?

You simply add absolute values of 2008 and 2009 GS standard grade differences for each player who has GS standard grades in both 2008 and 2009.

If I take into account only the players who played 30 or more standard games in 2008/09 season I get... A total absolute GS standard grade correction for the 1,976 players between 2008 and 2009 is 13,942 grading points, which is 7.06 grading points on average per player.

The point is that AGS3's (or other's 'ka + kb = 1' grading systems) total absolute grade correction is (or approximately is) half of that of GS, and if one should use 'ka + kb = 1' rather than 'ka + kb = 2' grading system the grade over-correction could be of the order of magnitude of 3 grading points on average per player per year (the over-corrections may cancel each other as seasons pass, but may also accumulate, which could cause what is referred to as 'grade stretching').

Roger de Coverly wrote:How do you deal with the issue as identified by EMW and BJV that all grades are wrong in that they contain at least two components - namely "estimate of strength" plus a "bias or error". ... The approach of looking at a grade as the sum of measure of strength and a measure of error/bias/improvement is valuable idea.

Could somebody please summarize the two component idea of "estimate of strength" plus a "bias or error"? Thanks.

For now I can say about "estimate of strength" that I think that the more games one plays in a season the more one's grade can be trusted, as it is based on a larger statistical sample. Nevertheless, ECF's approach of taking a minimum of 30 games to calculate one season grade should be large enough a statistical sample, if it is not one could increase the minimum number of games to be say 40 or 60, etc.

I proposed Ã‰GS6 (Ã‰lo Grading System six) which in each game estimates how much the player's grades should be trusted based on the frequency of play (basically by comparing the number of games the players' played in the previous season) and changes the grade (for that game) of the player whose grade should be less trusted more rapidly (i.e., it assures that 'ka + ka = 1', but instead of setting 'ka = kb = 0.5' it may say set 'ka = 0.8', 'kb = 0.2' for that game). In extreme case an un-graded player's 'k' factor is set to 1, which effectively means if an un-graded player played a graded player the grade of un-graded player will change for the full amount (his factor 'k' is 1) and the grade of the graded player would not change, graded player would be affected by this game in the same way as he or she drew against a player of exactly the same grade as his or hers (i.e., 'k' for the graded player is 0). (if a player's 'k' factor is 0 in a game it is open to a debate if one should count the game for the player as played or not, say if a graded player played 30 games against un-graded players should one count that he played 30 or 0 games in the season?)

Similar idea of keeping the sum of the 'k' factors to 1 but setting 'ka /= kb' can be used to change the grades of improving (or worsening) players more rapidly, though then one would need to advise an algorithm of assessing which of the players improved (worsened) and which played pretty much at his or her level (then you set higher 'k' value to the improving (or worsening) player). (say if a 120 player drew against a 110 player and if both the 110 and 120 player scored 50ish percent on average against 120 opposition then one could assign in that game a 'k' factor of 0 to the 120 player and a 'k' factor of 1 to the 110 player)

Brian Valentine wrote:I don't agree that moving from linear to logistic would improve things - I am agnostic on this. Professor Elo, in section 1.8 in his book, suggests the relationship is roughly linear upto the equivalent of about 40 ecf rating points (300 elo). I have not seen statistics to show what should happen above a 40 point difference to know what the curve should be. I do note that the ECF approach is much simpler to administer. What I do know is that when having the priviledge of playing Stuart Conquest last week I was expected to score .11 on FIDE, but only 0.1 on ECF, both are optimistic.

I agree with you that switching from linear to logistic 'p = f(d)' wouldn't change the things drastically on the global level, but I think it should be important for the games where the grade difference is larger than approximately 30 grading points.

Which 'p = f(d)'? It looks like is impossible (or very difficult) to measure playing strengths independently of performances (there is not a device one can put on the heads of chess players and get a measure of their playing strengths), if that would be possible, one would be able to plot 'p' against 'd' ('d' is playing strength difference, 'p' is expected performance) and find the best fit for 'p = f(d)'. Nevertheless, assuming that for small differences in playing strength (say '|d|<=30') the relationship between performance and difference in playing strength is linear, one can assume that grades for '|d|<=30' are in fact playing strengths (you calculate grades taking into account only games where '|d|<=30' and treat them as playing strengths) and, taking into account game records where '|d|>30', plot 'p' against 'd' (black dots in the figure 3 below) and find that 'p = f(d)' for '|d|>30' follows one of the sigmoid curves (yellow, brown and red lines in the figure 3) closer than linear approximations (green and blue lines in the figure 3).

Figure 3: Mr Welch's finding. The '(|d|>30, q)' discrete experimental points match one of the sigmoid curves (yellow, brown and red lines) better than liner approximations (green and blue lines). Note that both FIDE and USCF switched from normal (brown line above red) to logistic (yellow line) relationship 'p = f(d)' which they found provides a better fit for the actual results achieved. Please note that the discrete points shown are for illustration purposes only, they are not a result of an actual analysis of the experimental data, and are shown to best fit the yellow line). (blue line: ECF linear with 50 point rule; green line: ECF linear with 40 point rule; brown line: Ã‰lo's normal, 'p = 100*(1 + Erf[d/g])/2', 'g = 50', where the error function Erf[z] is the integral of the Gaussian distribution; red line: Ã‰lo's logistic with 'g = 52.3975...', 'p = 100/(1 + 10^(-d/g))', 'g = (25*Log[10])/Log[3] = 52.3975...'; yellow line: Ã‰lo's logistic with 'g = 50', 'p = 100/(1 + 10^(-d/g))', 'g = 50')

So, it looks like 'p = f(d)' ('d' is playing strength or grade difference, 'p' is expected performance) cannot be chosen arbitrarily, and that logistic relationship fits the experimental data the best of so far examined relationships.

If one can be more accurate for '|d|>30' is see no reason why one wouldn't be. (say it does not seem logical to expect the same 90% performance in cases where 'd=40' and 'd=120')

Brian Valentine wrote:The advantages of the ELO system are in its use of k factors not in the shape of the tail.

I suspect (but not sure) that the reason for 'grade stretching' could be the fact that (as currently is) 'ka + kb = 2' (GS has 'ka = kb = 1') and that it may be that it should be 'ka + kb = 1' (AGS3 has 'ka = kb = 0.5' but as Roger has pointed out the weakness of AGS3 is that it may cause grade lag, but if one would be able to advise a system where 'ka /= kb' honouring 'ka + kb = 1' in each game, setting 'k' factors closer to 1 for the players whose playing skills change more rapidly, then the grade lag could be reduced or eliminated, though the system may not be simple as the playing skills change assessment algorithm may be complex). (say if a 120 player won against a 120 player and if the loser scored 50ish percent on average against 120 opposition and the winner scored 50ish percent on average against 170 opposition then one could assign in that game a 'k' factor of 0 to the loser and a 'k' factor of 1 to the winner)

Roger de Coverly wrote:If they had not tampered with the system you would still have been 89 or 90 in 2009 ( but they never published this).

I know. (digression, I am not playing so well in this season, but if I did I'd expect to stay 120ish, I guess the problem with 'ka + kb' comes into play when I say under-perform, which I am doing at the moment, say I under-perform in 2009/10 and let us assume that GS ('ka + kb = 2') penalizes me too much for that, then even if I recover in 2010/11 season say by playing 40ish games on 120ish level, I guess I may not recover grade-wise, and if I under-perform in one or two more seasons, my grade may start drifting towards 90 again, even say a few season later I may perform at what would now be regarded as 120 level, not sure about this just speculating, this would also be unfair towards players who are improving, as say now they would play a 90ish player who actually should have been an 120ish player)

Roger de Coverly wrote:It's a valid measure to look at the sum of the absolute values of grade change. In isolation it doesn't tell you much because it's the relative values that determine the new grade. You could monitor it from one year to the next. By absolute change I mean that in a two player set where the grades change 140 -> 150 and 160 -> 150, the absolute change is 20 but the relative change zero. I would expect the higher values of absolute grade change in future because of the instability introduced by the junior changes.

If the change was 140 -> 145 and 160 -> 155 then the two players who are in your example 150 would be 145 and 155 respectively, I do not know what impact on grades in grand scheme that would have, maybe we need a strict definition of 'grade stretching'?

A measure of how much the grades calculated using two grading systems are 'stretched' in respect to each other may be related to the standard deviations of the grade distributions (say grades with larger standard deviation are 'stretched' in respect to grades with smaller standard deviation).

I have calculated AGS3 grades for 2009 (2008/2009 season) using GS grades for 2009 published in gradeslive.csv document which I downloaded at... http://grading.bcfservices.org.uk/downloads.php (I assumed that 2008 EGS3 grades are equal to 2008 GS grades and that number of games on which 2008 GS and AGS3 grades are based is 0).

The G30 rule below has been taken into account in the calculation.

Rule G30: The Grade is calculated by dividing the total number of points scored by the number of games played. If there are at least 30 games in the current period, then the Grade is based on these games alone. If there are not, results are brought forward from the previous period to make the total up to exactly thirty. If there are not 30 games in the two seasons together, results are taken from the season before that. Games are never taken from further back than this; the maximum is two prior grading periods.

The results can be found in ags3grade09.zip file which can be downloaded from... http://www.jurjevic.org.uk/chess/grade/. The (zip) file contains ags3grade09.txt (textual tab delimited), ags3grade09.xls (Excel spreadsheet document) and gradeslive.csv (Excel spreadsheet document).

The calculation was performed by a Windows .net console (command line) application (written in C# programming language) interfacing with Oracle Database 10g Express Edition database. Histograms are obtained with Mathematica 7. (I saved relevant data in a database, I have no game results as they are not published, but as GS's and AGS3's formulae are identical and the only difference is in the value of 'k' factors, one can calculate AGS3 grades using GS grades and no actual game results, the C# program was written by me, Oracle Database 10g Express Edition database is free, but it has limitations imposed, some or all of which are lifted in commercial versions)

Figure 1: Histogram for 2009 GS grades (consisting of 10153 grades in bins [0,1), [1,2), etc., of 1 grading point width, where minimum grade is 0.00, median grade is 133.00, maximum grade is 281.00, mean grade is 133.69 and standard deviation is 37.05).

Figure 2: Histogram for 2009 AGS3 grades (consisting of 10153 grades in bins [0,1), [1,2), etc., of 1 grading point width, where minimum grade is 0.00, median grade is 133.50, maximum grade is 281.00, mean grade is 133.55 and standard deviation is 36.73).

So, GS's standard deviation is 37.05 and AGS3's is 36.73 which may indicate that GS grades are 'stretched' relatively to AGS3 grades.

Kind regards,

English Chess Forum

GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES