184 Open Daily games
0 Open Realtime games
    Pages:   123456789»»»   (11 in total)
  1. #1 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    I'll start with a pro:  It is a fact that if your win% for 3-player games is > 1/3, 4-player win % is > 1/4, and so on, your G Rating is > 1.  Generally, we should expect players with high win percentages to have higher G Ratings.  The description from the help file is remarkably accurate:

    "G rating is a normalized rating based on how many games you expect to win on a game of a particular size."  _of a particular size_ is required for the stat to maintain its meaning.

    However, the calculation of the G rating mixes games of different sizes and here it goes very wrong.  Winning 100% of your duels would be truly remarkable (from a percentage standpoint), yet it is equivalent to winning only 40% of your 5 player games, which many players achieve.  100% two player and 40% five player both yield a 2.0 G Rating.

    I think the a priori criticisms are too easy, so I decided to see what the different the skewings among the top 10 are if you delete their two player games.  In this data set, EVERY SINGLE PLAYER'S G RATING WENT UP AFTER DELETING THEIR TWO PLAYER GAMES!!!!  

    The skewings:  +.13, +.17, +.19, +.12, +.47, +.23, +.32,  +.09, +.07, +.38

    The large plusses usually occurred with players who play a high proportion of duels.  

    It was suggested that over time the skewing should go to zero.  In fact, it will not if over time a player maintains the same proportion of duels being played.  Of course it is not unique to duels.  I confidently conjecture that if we deleted 6-10 player games from (good) players who play a lot of those games, that their G Rating would drop.


  2. #2 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    The intent of the G Rating was well expressed by BlackDog: A low G Rating with a high ranking means that I generally play higher rated players, while a high G Rating with a high ranking means that I play lower rated players.

    I think this is the true intent of the G Rating: Attach a numerical value, something like a percentage, that ignores the difficulty of the opposition and yet makes sense across different sized games.

    I have a real hard time saying what this _truly_ should be, but I do have a suggestion (to be posted later), that has predictive value within our current ranking system.


  3. #3 / 211
    They see me rollin' IRoll11s
    Rank
    Private
    Rank Posn
    #1534
    Join Date
    Nov 09
    Location
    Posts
    632

    I sort of agree with you, let me see if I can restate it and you tell me if I understand you objection.

    The G-rating is there to give a 'normalized' rating with regards to the number of players in a game, and should give a value indicative of how much better a given person is doing compared to how many they 'should' win if all players were dumb automatons who rolled randomly. Thus winning 50% of 2-player games is treated the same as winning 1/16th of a 16 player games.

    While that is true, your objection is based upon the fact that an elite player will win proportionally more games as the number of players are increased, which I believe is true. Winning 1/8th of 16 player games is easily doable, while winning 100% of your duels is impossible. The reasons for this are many, some are more obvious than others, and ultimately the reasons don't matter.

    Since players are ranked in part by G-rating, it 'punishes' those players who prefer duels and inflates the rankings of those players who prefer large games.

    My first instinct would be to attempt to quantify the rate of increase per player added to a game in terms of what the elite players can actually accomplish. I'm thinking along the lines of averaging the actual winning percentage per players/game of the top X G-rated players on the site. From this you could normalize the G-rating to obtain a G-prime rating, which would serve to eliminate the skewings you mentioned.

    Where he keeps his liquor is a secret still.

  4. #4 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    I agree with 11s. At the very least, where the skewing comes into play is for those top of the line players who can better than double the expected win ratio in games with more than two players. This means you have to win better than 66% of your matches in three player games to break the system. In cases like this, playing only 2 player games would hurt their g-ratings. But even 11's post suggests that it's a bit more pervasive than that.

    So what if we topped out the G rating at 2?

    Consider for simplicity's sake the kind of player that can win 75% of their two-player games.  Could we say this is half-way between average and perfection?  Let's give this excellent player a G rating of 1.5 (half-way between 1 and 2.)

    Now let’s consider that all things being equal, the same (half way to perfection kind of guy)  should win not 10%, but rather 55% of all ten player games (I.e., somewhere around 11 out of every 20 games played).

    Also consider the not-so-good 25% half-way-to-total-loser player who will win 1 out of 20 ten player games (1/2 the expected).   His G rating should be .5 for that pathetic effort.

    But how do you accomplish this?  What would the math look like? What about using 11’s idea and somehow creating a straight percentage score where 50% would be expected and 100% would be perfect?

    ..but we won't be happy until there is a "barren" designer feature.
    Edited Thu 1st Jul 20:05 [history]

  5. #5 / 211
    Premium Member Yertle
    Rank
    Major General
    Rank Posn
    #21
    Join Date
    Nov 09
    Location
    Posts
    3997

    Hugh wrote:

    The description from the help file is remarkably accurate:

    "G rating is a normalized rating based on how many games you expect to win on a game of a particular size."  _of a particular size_ is required for the stat to maintain its meaning.

    I'm pretty certain I stole that from tom {#emotions_dlg.razz}


  6. #6 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    IRoll11s wrote: 

    While that is true, your objection is based upon the fact that an elite player will win proportionally more games as the number of players are increased, which I believe is true. Winning 1/8th of 16 player games is easily doable, while winning 100% of your duels is impossible. The reasons for this are many, some are more obvious than others, and ultimately the reasons don't matter.

    Since players are ranked in part by G-rating, it 'punishes' those players who prefer duels and inflates the rankings of those players who prefer large games.

    I'm glad you're here to help me clarify :)  I'm stating something stronger.  I believe the effect you speak of is real - whatever equivalent performance across game sizes is defined to be, elite players should perform better in games with more players.  I believe G Rating to be more broken than that.

    G Ratings for particular game sizes compare perfectly well across that game size.  I contend that, except around the 1.0 mark, the G Rating does a poor job of comparing performances across game sizes.

    It is easy to construct toy examples:  Here is one - suppose a player wins 3 consecutive two player games.  They have played 3 games, their G Rating is 2.0.  As a performance, this is good, 1/8 is the probability of doing that if it's just coin flips.  A different player plays an 8 player game and wins.  1/8 is also the probability of hitting that performance randomly.  Now the 2nd player plays two more 8 player games (losing both), so that the weighting is the same as the first player and due to the losses, the 2nd player has performed worse.  The first player has 2.0 weighted at 3 games, the second player has 2.67 weighted at 3 games, yet the first player clearly outperformed the second player.  

    It does a poor job across different game sizes is the summary of what I am saying.  Examples similar to the above should be constructible even if we restricted to 4-player vs 5-player.


  7. #7 / 211
    Premium Member Yertle
    Rank
    Major General
    Rank Posn
    #21
    Join Date
    Nov 09
    Location
    Posts
    3997

    Hugh wrote:

    It is easy to construct toy examples:  Here is one - suppose a player wins 3 consecutive two player games.  They have played 3 games, their G Rating is 2.0.  As a performance, this is good, 1/8 is the probability of doing that if it's just coin flips.  A different player plays an 8 player game and wins.  1/8 is also the probability of hitting that performance randomly.  Now the 2nd player plays two more 8 player games (losing both), so that the weighting is the same as the first player and due to the losses, the 2nd player has performed worse.  The first player has 2.0 weighted at 3 games, the second player has 2.67 weighted at 3 games, yet the first player clearly outperformed the second player.  

    I would say there could be an argument against "the first player clearly outperformed the second player".  Player 1 defeated 6 players in the 3 games he played, while player 2 defeated 7 players in the one game he won and then played against another 14 that he lost to.  There are times where winning an 8 player game is vastly more difficult than winning a 3 player game (and the same could potentially be said for the opposite, but with the 8 player game you do have "less" of a chance of winning).


  8. #8 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    I'll concede a point, but defend first. Random move making automata have a 1/8 probability of winning three consecutive two player games. Random move making automata have a 1/8 probability of winning a single 8-player game. There is a symmetry involved in an 8-player game - it isn't any harder for one player versus another (until you factor in skill starting position and those things).

    The point I'll concede is that I used uniform probabilities as my point of comparison. I don't have a great answer to how best to compare across game sizes. I use uniform probabilities because I don't know where else to turn for comparison. However, no matter what the metric, a person winning 50/100 of their 5-player games for a G Rating of 2.5 has not outperformed a player who has won 1000/1000 two player games for a G Rating of 2.0.


  9. #9 / 211
    Major General asm asm is offline now
    Standard Member asm
    Rank
    Major General
    Rank Posn
    #19
    Join Date
    Nov 09
    Location
    Posts
    1686

    I love this thread a lot.

    IF YOU ARE SUGGESTING ASM IS A GOOD PLAYER YOU WILL STOP NOW, OR I WILL CALL HR AND I WILL PUT AN END TO IT, FOR THAT IS WHAT IT SOUNDS LIKE TO ME.

  10. #10 / 211
    Pop. 1, Est. 1981 Alpha
    Rank
    Brigadier General
    Rank Posn
    #61
    Join Date
    Dec 09
    Location
    Posts
    991

    Hugh wrote:
    It was suggested that over time the skewing should go to zero.  In fact, it will not if over time a player maintains the same proportion of duels being played.  Of course it is not unique to duels.  I confidently conjecture that if we deleted 6-10 player games from (good) players who play a lot of those games, that their G Rating would drop.

    I did suggest something similar (not zero), but concede that the skewing will not go away (maybe this is the reason for my comparatively low g-rating - lots of spies).

    I meant to say and didn't that if players play a lot of games of all sizes then the skewing is incorporated and not as important, but certainly a good player that only play games with a lot of players will have a higher g-rating than a good player who only plays dueling maps.  (This is well explained by Hugh above).

    Here my solution, although it is not easy to implement:  Give each player a number of points on a somewhat logarithmic scale.

    For two player games (50% wins is expected) so:
    Win Percentage  Points
    <6.25%               0
    6.25%                 .125             (12.5% of expected win percentage)
    12.5%                 .25               (25% of expected win percentage)
    25%                    .5                 (50% of expected win percentage)
    50%                    1                  (expected)
    75%                    1.5               (150% of expected win percentage)
    82.5%                 1.75             (175% of expected win percentage)
    88.75%               1.875           (187.5% of expected win percentage)
    >88.75%             2

    For three player games (33% is expected) so:
    Win Percentage  Points
    < 4.125%            0
    4.125%               .125
    8.25%                 .25
    16.5%                 .5
    33%                    1
    49.5%                 1.5
    57.75%               1.75
    61.875%             1.875
    > 61.875%          2

    For anyone who understands what is above, this should be averagable to give a meaningful stat as long as you ignore cases of "played one 10 player game and won" (set minimum number of games of player size to be ___ before used in stat).

    If I am wrong with what I said above, please let me know.


  11. #11 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    I did mention a suggestion that works towards the goal of having meaning within our ranking system. It is actually a very mild variant of G Rating. It is this:

    Each loss is 0/1, but each win is regarded as going 1/1 against each opponent you beat. So, 1/1 in two player wins, 2/2 in three player wins, 3/3 in four player wins etc. We'll refer to the denominator as "effective games" (instead of total games). This may seem strange, but it corresponds precisely to how our rating system is structured.

    Does it satisfy the property everyone loved about G Ratings? Yes - if you win 1/4 of your 4-player games, your score is 0.5, 1/5 of your 5-player games is also 0.5. Less gives a number under 0.5, more gives a number greater than 0.5.

    It maxes out at 1.0 regardless of game size.

    Lastly, equal scores with different game sizes corresponds to equal rating gains. A 0.6 score achieved solely with 4-player games has the same rating gain as a 0.6 score achieved solely with 2-player games provided the same number of "effective games" were played and the opponents were of equal rating to the player. Thus, if two players had the same Global Rating, we could tell who had the higher rated opponents by looking at this score. G rating does not share this property due to the game size skewing.

    Disclaimer: The way in which this system compares performances across game sizes may not reflect "the truth" about which was better, but it is compatible with our current rating system.


  12. #12 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    Alpha - your suggestion looks like G Rating, but capped at 2? Right?


  13. #13 / 211
    They see me rollin' IRoll11s
    Rank
    Private
    Rank Posn
    #1534
    Join Date
    Nov 09
    Location
    Posts
    632

    Testing by looking at edge cases confirms that this is a good metric. To wit:

    Dumb-monkey expected wins:

    game size - wins/losses - avg. - g-rating - h-rating
    2 - 1/2 - 50.0% - 1 - .5 (1/2)
    4 - 1/4 - 25.0% - 1 - .5 (3/6)
    8 - 1/8 - 12.5% - 1 - .5 (7/14)

    God of war wins:

    game size - wins/losses - avg. - g-rating - h-rating
    2 - 2/2 - 100% - 2 - 1 (2/2)
    4 - 4/4 - 100% - 4 - 1 (12/12)
    8 - 8/8 - 100% - 8 - 1 (49/49)

    Average is a poor metric that cannot be combined with games of various sizes unless someone wins all or none of their games, which is why G-rating was established. However G-rating can only be accurately combined as a comparison when someone wins exactly as many dumb-monkey games as expected (G-rating = 1) and starts to skew badly both up and down the skill scale.

    Once again I'm only clarifying for you Hugh. Alpha your log scale has merit but I believe this is more precise. The only part of Hugh's post I didn't understand was this:

    "...and the opponents were of equal rating to the player. Thus, if two players had the same Global Rating, we could tell who had the higher rated opponents by looking at this score."

    This might be because I really have no interest in rankings other than a passing mathematical interest, so I'm not even sure how the Global and Board rankings work. I'm not sure how the H-rating would give any indication of how tough your opponents were, since there is nothing in the calculation that uses the global or board rankings.

    Where he keeps his liquor is a secret still.

  14. #14 / 211
    Where's the armor? Mongrel
    Rank
    Brigadier General
    Rank Posn
    #54
    Join Date
    Nov 09
    Location
    Posts
    522

    I support Alpha's logistic, Hugh's holistic, 11's heuristics and ASM's linguistics.

    Longest innings. Most deadly.

  15. #15 / 211
    Standard Member Tesctassa II
    Rank
    Captain
    Rank Posn
    #227
    Join Date
    Jan 10
    Location
    Posts
    129

    Mongrel wrote: I support Alpha's logistic, Hugh's holistic, 11's heuristics and ASM's linguistics.

    As soon as I'll finish my exams (my last two ever!!!! °___°) and get my internet connection back, I'll be more into this than I'm actually doing (I'm trying to follow this thread though).

     

    Support the 'istics'!

    (=


  16. #16 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Ok, we’ve seen two solutions that even the playing field in terms of “capping” the limits of a rating.  One ranges from 0-2 and the other 0-100%, both of which I like.

    But they still have the disadvantage of not being very helpful when players have not played many games.  E.g., monkey joins WG, monkey somehow wins all 2 or 3 games played, and monkey retires with the best possible g-rating.

    Why not measure the likelihood of a certain performance in terms of deviation from the norm?

    Lucky monkey goes 3/4 playing in 2-player games, putting him in the top 12.5% of all monkeys who play four 2-player games = +1.15 expressed as a standard deviation.

    Same monkey then goes 3/4 playing in 3-player games, putting him in the top 1.2% for a SD of +2.26

    From here it’s a matter of weighting these performances. Because monkey has played the same number of games in each category in this particular case I’m calling it the mean of these numbers for a composite SD of +1.70.

    Consider that a player who goes 2/2 in a pair of 8-player games has a SD of +2.15.

    It follows that these numbers could still be inflated for those who don't play many games, but I think it would be pretty hard to whip up a good rating after 10 or so games, so I would propose that a "G"-rating isn't valid (posted), until 10 games have been played.

    ..but we won't be happy until there is a "barren" designer feature.
    Edited Fri 2nd Jul 08:42 [history]

  17. #17 / 211
    Where's the armor? Mongrel
    Rank
    Brigadier General
    Rank Posn
    #54
    Join Date
    Nov 09
    Location
    Posts
    522

    M57 wrote:

    Ok, we’ve seen two solutions that even the playing field in terms of “capping” the limits of a rating.  One ranges from 0-2 and the other 0-100%, both of which I like.

    But they still have the disadvantage of not being very helpful when players have not played many games.  E.g., monkey joins WG, monkey somehow wins all 2 or 3 games played, and monkey retires with the best possible g-rating.

    Why not measure the likelihood of a certain performance in terms of deviation from the norm?

    Lucky monkey goes 3/4 playing in 2-player games, putting him in the top 87.5% of all monkeys who play four 2-player games = +1.15 expressed as a standard deviation.

    Same monkey then goes 3/4 playing in 3-player games, putting him in the top +98.8% for a SD of +2.26

    From here it’s a matter of weighting these performances. Because monkey has played the same number of games in each category in this particular case I’m calling it the mean of these numbers for a combined SD of +1.70.

    I like (and agree) with this too, but before we get much further, I'd just like to say that every ranking system will fail some "reasonable criterion". Somebody proved this. Alpha told me about it, and was why I was OK with G-rating.

    Improvements to G rating? Sure. Will that rating be "wrong" in some way? Always.

    The WF (gasp!) idea was to not count percentages until a certain number of games was reached. Seems to be a decent 4th solution.

    In summary: Grip it, rip it, move on. We had a seemingly similar discussion about an aggressiveness stat that ended with.... well, it ended.

    Longest innings. Most deadly.

  18. #18 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Mongrel wrote: every ranking system will fail some "reasonable criterion". 

    Fair enough.  What reasonable criterion does the SD method not cover?  We've already determined that the G-Rating can be gamed, which is a pretty strong strike against it.

    ..but we won't be happy until there is a "barren" designer feature.
    Edited Fri 2nd Jul 08:56 [history]

  19. #19 / 211
    Standard Member BlackDog
    Rank
    Lieutenant General
    Rank Posn
    #5
    Join Date
    Apr 10
    Location
    Posts
    359

    I think we should ignore skewing at low game counts, and focus on eliminating skewing due to different game sizes at high game counts.  Nobody cares about the G rating of someone who just joined the site.

    Edited Fri 2nd Jul 10:45 [history]

  20. #20 / 211
    Where's the armor? Mongrel
    Rank
    Brigadier General
    Rank Posn
    #54
    Join Date
    Nov 09
    Location
    Posts
    522

    M57 wrote:

    Fair enough.

    HA! Exactly.

    Not sure what example would exploit your formula over others, but it probably happens when you weight the deviations together. What the SD approach captures that the other two do not is "sustained dominance"- my guess is that the global ranking is also an attempt to do this. From Hugh's formula, one can extrapolate some crisp ancillary data about quality of opponents which I also like.

    Coin flip.

    Longest innings. Most deadly.

You need to log in to reply to this thread   Login | Join
 
Pages:   123456789»»»   (11 in total)