Hello all,
Roger de Coverly wrote:Robert Jurjevic wrote:There are total of 11,701 ECF players with standard grades for 2009 and 2008.
I expect I could reproduce this.
Robert Jurjevic wrote:
A total absolute GS grade correction for the 11,701 players between 2008 and 2009 is 52,401 grading points, which is 5.16 grading points on average per player.
Could you explain this please and show how it's calculated?
You simply add absolute values of 2008 and 2009 GS standard grade differences for each player who has GS standard grades in both 2008 and 2009.
If I take into account only the players who played 30 or more standard games in 2008/09 season I get... A total absolute GS standard grade correction for the 1,976 players between 2008 and 2009 is 13,942 grading points, which is 7.06 grading points on average per player.
The point is that AGS3's (or other's 'ka + kb = 1' grading systems) total absolute grade correction is (or approximately is) half of that of GS, and if one should use 'ka + kb = 1' rather than 'ka + kb = 2' grading system the grade over-correction could be of the order of magnitude of 3 grading points on average per player per year (the over-corrections may cancel each other as seasons pass, but may also accumulate, which could cause what is referred to as 'grade stretching').
Roger de Coverly wrote:How do you deal with the issue as identified by EMW and BJV that all grades are wrong in that they contain at least two components - namely "estimate of strength" plus a "bias or error". ... The approach of looking at a grade as the sum of measure of strength and a measure of error/bias/improvement is valuable idea.
Could somebody please summarize the two component idea of "estimate of strength" plus a "bias or error"? Thanks.
For now I can say about "estimate of strength" that I think that the more games one plays in a season the more one's grade can be trusted, as it is based on a larger statistical sample. Nevertheless, ECF's approach of taking a minimum of 30 games to calculate one season grade should be large enough a statistical sample, if it is not one could increase the minimum number of games to be say 40 or 60, etc.
I proposed ÉGS6 (Élo Grading System six) which in each game estimates how much the player's grades should be trusted based on the frequency of play (basically by comparing the number of games the players' played in the previous season) and changes the grade (for that game) of the player whose grade should be less trusted more rapidly (i.e., it assures that 'ka + ka = 1', but instead of setting 'ka = kb = 0.5' it may say set 'ka = 0.8', 'kb = 0.2' for that game). In extreme case an un-graded player's 'k' factor is set to 1, which effectively means if an un-graded player played a graded player the grade of un-graded player will change for the full amount (his factor 'k' is 1) and the grade of the graded player would not change, graded player would be affected by this game in the same way as he or she drew against a player of exactly the same grade as his or hers (i.e., 'k' for the graded player is 0). (if a player's 'k' factor is 0 in a game it is open to a debate if one should count the game for the player as played or not, say if a graded player played 30 games against un-graded players should one count that he played 30 or 0 games in the season?)
Similar idea of keeping the sum of the 'k' factors to 1 but setting 'ka /= kb' can be used to change the grades of improving (or worsening) players more rapidly, though then one would need to advise an algorithm of assessing which of the players improved (worsened) and which played pretty much at his or her level (then you set higher 'k' value to the improving (or worsening) player). (say if a 120 player drew against a 110 player and if both the 110 and 120 player scored 50ish percent on average against 120 opposition then one could assign in that game a 'k' factor of 0 to the 120 player and a 'k' factor of 1 to the 110 player)
Brian Valentine wrote:I don't agree that moving from linear to logistic would improve things - I am agnostic on this. Professor Elo, in section 1.8 in his book, suggests the relationship is roughly linear upto the equivalent of about 40 ecf rating points (300 elo). I have not seen statistics to show what should happen above a 40 point difference to know what the curve should be. I do note that the ECF approach is much simpler to administer. What I do know is that when having the priviledge of playing Stuart Conquest last week I was expected to score .11 on FIDE, but only 0.1 on ECF, both are optimistic.
I agree with you that switching from linear to logistic 'p = f(d)' wouldn't change the things drastically on the global level, but I think it should be important for the games where the grade difference is larger than approximately 30 grading points.
Which 'p = f(d)'? It looks like is impossible (or very difficult) to measure playing strengths independently of performances (there is not a device one can put on the heads of chess players and get a measure of their playing strengths), if that would be possible, one would be able to plot 'p' against 'd' ('d' is playing strength difference, 'p' is expected performance) and find the best fit for 'p = f(d)'. Nevertheless, assuming that for small differences in playing strength (say '|d|<=30') the relationship between performance and difference in playing strength is linear, one can assume that grades for '|d|<=30' are in fact playing strengths (you calculate grades taking into account only games where '|d|<=30' and treat them as playing strengths) and, taking into account game records where '|d|>30', plot 'p' against 'd' (black dots in the figure 3 below) and find that 'p = f(d)' for '|d|>30' follows one of the sigmoid curves (yellow, brown and red lines in the figure 3) closer than linear approximations (green and blue lines in the figure 3).
Figure 3: Mr Welch's finding. The '(|d|>30, q)' discrete experimental points match one of the sigmoid curves (yellow, brown and red lines) better than liner approximations (green and blue lines). Note that both FIDE and USCF switched from normal (brown line above red) to logistic (yellow line) relationship 'p = f(d)' which they found provides a better fit for the actual results achieved. Please note that the discrete points shown are for illustration purposes only, they are not a result of an actual analysis of the experimental data, and are shown to best fit the yellow line). (blue line: ECF linear with 50 point rule; green line: ECF linear with 40 point rule; brown line: Élo's normal, 'p = 100*(1 + Erf[d/g])/2', 'g = 50', where the error function Erf[z] is the integral of the Gaussian distribution; red line: Élo's logistic with 'g = 52.3975...', 'p = 100/(1 + 10^(-d/g))', 'g = (25*Log[10])/Log[3] = 52.3975...'; yellow line: Élo's logistic with 'g = 50', 'p = 100/(1 + 10^(-d/g))', 'g = 50')
So, it looks like 'p = f(d)' ('d' is playing strength or grade difference, 'p' is expected performance) cannot be chosen arbitrarily, and that logistic relationship fits the experimental data the best of so far examined relationships.
If one can be more accurate for '|d|>30' is see no reason why one wouldn't be. (say it does not seem logical to expect the same 90% performance in cases where 'd=40' and 'd=120')
Brian Valentine wrote:The advantages of the ELO system are in its use of k factors not in the shape of the tail.
I suspect (but not sure) that the reason for 'grade stretching' could be the fact that (as currently is) 'ka + kb = 2' (GS has 'ka = kb = 1') and that it may be that it should be 'ka + kb = 1' (AGS3 has 'ka = kb = 0.5' but as Roger has pointed out the weakness of AGS3 is that it may cause grade lag, but if one would be able to advise a system where 'ka /= kb' honouring 'ka + kb = 1' in each game, setting 'k' factors closer to 1 for the players whose playing skills change more rapidly, then the grade lag could be reduced or eliminated, though the system may not be simple as the playing skills change assessment algorithm may be complex). (say if a 120 player won against a 120 player and if the loser scored 50ish percent on average against 120 opposition and the winner scored 50ish percent on average against 170 opposition then one could assign in that game a 'k' factor of 0 to the loser and a 'k' factor of 1 to the winner)
Roger de Coverly wrote:If they had not tampered with the system you would still have been 89 or 90 in 2009 ( but they never published this).
I know. (digression, I am not playing so well in this season, but if I did I'd expect to stay 120ish, I guess the problem with 'ka + kb' comes into play when I say under-perform, which I am doing at the moment, say I under-perform in 2009/10 and let us assume that GS ('ka + kb = 2') penalizes me too much for that, then even if I recover in 2010/11 season say by playing 40ish games on 120ish level, I guess I may not recover grade-wise, and if I under-perform in one or two more seasons, my grade may start drifting towards 90 again, even say a few season later I may perform at what would now be regarded as 120 level, not sure about this just speculating, this would also be unfair towards players who are improving, as say now they would play a 90ish player who actually should have been an 120ish player)
Roger de Coverly wrote:It's a valid measure to look at the sum of the absolute values of grade change. In isolation it doesn't tell you much because it's the relative values that determine the new grade. You could monitor it from one year to the next. By absolute change I mean that in a two player set where the grades change 140 -> 150 and 160 -> 150, the absolute change is 20 but the relative change zero. I would expect the higher values of absolute grade change in future because of the instability introduced by the junior changes.
If the change was 140 -> 145 and 160 -> 155 then the two players who are in your example 150 would be 145 and 155 respectively, I do not know what impact on grades in grand scheme that would have, maybe we need a strict definition of 'grade stretching'?
A measure of how much the grades calculated using two grading systems are 'stretched' in respect to each other may be related to the standard deviations of the grade distributions (say grades with larger standard deviation are 'stretched' in respect to grades with smaller standard deviation).
I have calculated AGS3 grades for 2009 (2008/2009 season) using GS grades for 2009 published in gradeslive.csv document which I downloaded at...
http://grading.bcfservices.org.uk/downloads.php (I assumed that 2008 EGS3 grades are equal to 2008 GS grades and that number of games on which 2008 GS and AGS3 grades are based is 0).
The G30 rule below has been taken into account in the calculation.
Rule G30: The Grade is calculated by dividing the total number of points scored by the number of games played. If there are at least 30 games in the current period, then the Grade is based on these games alone. If there are not, results are brought forward from the previous period to make the total up to exactly thirty. If there are not 30 games in the two seasons together, results are taken from the season before that. Games are never taken from further back than this; the maximum is two prior grading periods.
The results can be found in
ags3grade09.zip file which can be downloaded from...
http://www.jurjevic.org.uk/chess/grade/. The (zip) file contains ags3grade09.txt (textual tab delimited), ags3grade09.xls (Excel spreadsheet document) and gradeslive.csv (Excel spreadsheet document).
The calculation was performed by a Windows .net console (command line) application (written in C# programming language) interfacing with Oracle Database 10g Express Edition database. Histograms are obtained with Mathematica 7. (I saved relevant data in a database, I have no game results as they are not published, but as GS's and AGS3's formulae are identical and the only difference is in the value of 'k' factors, one can calculate AGS3 grades using GS grades and no actual game results, the C# program was written by me, Oracle Database 10g Express Edition database is free, but it has limitations imposed, some or all of which are lifted in commercial versions)
Figure 1: Histogram for 2009 GS grades (consisting of 10153 grades in bins [0,1), [1,2), etc., of 1 grading point width, where minimum grade is 0.00, median grade is 133.00, maximum grade is 281.00, mean grade is 133.69 and standard deviation is 37.05).
Figure 2: Histogram for 2009 AGS3 grades (consisting of 10153 grades in bins [0,1), [1,2), etc., of 1 grading point width, where minimum grade is 0.00, median grade is 133.50, maximum grade is 281.00, mean grade is 133.55 and standard deviation is 36.73).
So, GS's standard deviation is 37.05 and AGS3's is 36.73 which may indicate that GS grades are 'stretched' relatively to AGS3 grades.
Kind regards,