219 Open Daily games
2 Open Realtime games
    Pages:   123456789»»»   (11 in total)
  1. #21 / 211
    Standard Member RiskyBack
    Rank
    Colonel
    Rank Posn
    #104
    Join Date
    Nov 09
    Location
    Posts
    1190

    IRoll11s wrote: Testing by looking at edge cases confirms that this is a good metric. To wit:

    Dumb-monkey expected wins:

    game size - wins/losses - avg. - g-rating - h-rating
    2 - 1/2 - 50.0% - 1 - .5 (1/2)
    4 - 1/4 - 25.0% - 1 - .5 (3/6)
    8 - 1/8 - 12.5% - 1 - .5 (7/14)

    Was this a RiskySlam?

    Cobra Commander + Larry - Mo * Curly = RiskyBack

  2. #22 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    I'm pretty sure that over time, both Hugh's system and the SD system will head towards a nominal point, and from there it will move very little because it represents a cumulative lifetime rating. Conversely, the artificial WG Point system can fluctuate quite a bit at any given time because it doesn't "remember" how long it took you to get to your "nominal" point.  Also, at least at first, it favors players who play more games.  On the other hand, H and SD, and even G ratings can get you "on the radar" quicker with reasonable accuracy regarding your ability.

    Because G-ratings, H-ratings and SD-ratings are all "lifetime" types of statistical ratings, I think it would be very interesting as more and more people rack up hundreds of games to see these types of ratings expressed graphically.  Something like a last 25 or 50 game moving average would be pretty cool. Of course, this wouldn't be relevant for the standard Point system, so it further points to the need for the types of statistical information we're talking about here.

    I don't believe this conversation should result in no action.  Not when there is such clear evidence that over time, G-ratings can be gamed.   Unless someone can point out otherwise, this is not the case with the alternative systems that have been discussed.  I think we're on to something here.

    ..but we won't be happy until there is a "barren" designer feature.
    Edited Fri 2nd Jul 11:55 [history]

  3. #23 / 211
    Major General asm asm is offline now
    Standard Member asm
    Rank
    Major General
    Rank Posn
    #20
    Join Date
    Nov 09
    Location
    Posts
    1686

    BlackDog wrote:

    I think we should ignore skewing at low game counts, and focus on eliminating skewing due to different game sizes at high game counts.  Nobody cares about the G rating of someone who just joined the site.

    QFT

    IF YOU ARE SUGGESTING ASM IS A GOOD PLAYER YOU WILL STOP NOW, OR I WILL CALL HR AND I WILL PUT AN END TO IT, FOR THAT IS WHAT IT SOUNDS LIKE TO ME.

  4. #24 / 211
    Premium Member Yertle
    Rank
    Major General
    Rank Posn
    #21
    Join Date
    Nov 09
    Location
    Posts
    3997

    M57 wrote:Not when there is such clear evidence that over time, G-ratings can be gamed.  

    I don't understand how the G-rating can be "gamed", especially over time.


  5. #25 / 211
    Standard Member Vataro
    Rank
    Sergeant
    Rank Posn
    #437
    Join Date
    Nov 09
    Location
    Posts
    574

    I am in agreement with asm.

    Give a man fire and he's warm for a day... but set him on fire and he's warm for the rest of his life.

  6. #26 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Yertle wrote:
    M57 wrote:Not when there is such clear evidence that over time, G-ratings can be gamed.  

    I don't understand how the G-rating can be "gamed", especially over time.

    I thought we arrived at a consensus that better players who play a disproportionate amount of many-player games (like 5 and 6+) will push their G-rating higher.

    ..but we won't be happy until there is a "barren" designer feature.

  7. #27 / 211
    Premium Member Yertle
    Rank
    Major General
    Rank Posn
    #21
    Join Date
    Nov 09
    Location
    Posts
    3997

    M57 wrote:
    Yertle wrote:
    M57 wrote:Not when there is such clear evidence that over time, G-ratings can be gamed.  

    I don't understand how the G-rating can be "gamed", especially over time.

    I thought we arrived at a consensus that better players who play a disproportionate amount of many-player games (like 5 and 6+) will push their G-rating higher.

    Hmmmm, better players will push their G-rating higher when they play more many-player games...that's right...right?  How's that then "gamed"? That makes sense right?

    I have the lowest G-rating in the Board CP top 10, but that makes sense since I have played a significant amount of 2 player games which normally I have an advantage in anyhow since most 2 player maps are somewhat based on the number of times you play the board and get the strategy.  If I wanted to increase my G-rating then I should be playing more many-player games since that advantage normally decreases on boards that hold more people and games with more people.


  8. #28 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    When I say "gamed", perhaps I'm mis-using the expression.  A better word might be "manipulated".

    While I agree there is logic to the argument that in two player games, the better player is more "in control", the fact remains that the lower the number of players the more severely capped the potential g-rating, I would venture that the various advantages of playing in high-#-of-player games outweighs the 1v1 "control" factor.

    I just thought of another reason that 2 player games can bring down a g-rating.   This will not be true with all boards because some are quite fair, but I think it's safe to say that most boards favor the player who moves first.  Depending on how severe the board's tilt, the more the better player has to overcome.

    ..but we won't be happy until there is a "barren" designer feature.
    Edited Fri 2nd Jul 13:31 [history]

  9. #29 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    M57 wrote:

     ..the various advantages of playing in high-#-of-player games outweighs the 1v1 "control" factor.

    Case in point:  poloquebec has the highest G-rating of any player with more than 100 games. 

    Take a look at those games and draw your own conclusions.

    ..but we won't be happy until there is a "barren" designer feature.

  10. #30 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    Yay! Debate, discussion, Denny's new RiskySlam intestine destroyer! (my math threads never go this well!)

    @G-Rating manipulating: No one "manipulates" their G Rating because it isn't what gets you into the top X page. However, as M57 is correctly pointing out, the data is speaking louder than we are. Broad ranges of G Ratings may have people roughly in the right spots, but game sizes are introducing more noise than is necessary.

    @Standard Deviation: I don't have it in me to disagree with keeping variation statistics. In fact, Glicko is preferred by chess players over Elo due to the usage of SD statistics employed in ranking calculations. For something like G Rating, I can see why people prefer one number to follow (simplicity). For percentages, G Ratings and the like, min# of games solves this.

    @M57 even more: I like the idea of using percentiles, I'd like to figure out the best way to do this. Where it gets tricky is that if someone has a > 50% win rate, their percentile pushes towards 100% as they play more games. This may not be an issue if you are trying to use the percentile data to calculate a normalized percentage (by comparing, say, 4 player percentiles to 2 player percentiles with the same amount of games). The idea fascinates me, because it might be the right one.

    @Alpha: I like the idea of a logarithmic scale, also because it might be the right idea, but your data set exactly matched G Rating (as far as I could tell), so I don't know what logarithmic formula you are using.

    @Mongrel: Similar to aggressiveness thread in that we haven't "solved" the hard problem. However, rating systems can improve without hitting the ideal, and I see many good ideas towards this end.


  11. #31 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    IRoll11s wrote: 

    Once again I'm only clarifying for you Hugh. Alpha your log scale has merit but I believe this is more precise. The only part of Hugh's post I didn't understand was this:

    "...and the opponents were of equal rating to the player. Thus, if two players had the same Global Rating, we could tell who had the higher rated opponents by looking at this score."

    Well, I know I need it!! (the alternative is the 7 page Hugh post that no one should ever read.)

    I should have been way more careful with that statement, since it is really hard to clarify, and its proof utilizes highly idealized conditions that don't occur in practice.  Anyway, here is an attempt:  Given two players who have reached an equilibrium Global rating against a type of opponent (for example, player 1 plays only 1000 rated players, player 2 plays only 1400 rated players), if their global ratings are equal, but their H ratings are not, you can read off who played the stronger opponents (the one with the lower H rating).   The H Rating advantage is that this remains true if you vary game size, but with a G Rating, game size will rear its ugly head.

    I conjecture that the idealization "plays only 1000 rated players" can be replaced by "the opponents' average rating is 1000" via some sort of law of large number argument.

    For more on equilibrium ratings, there is the Hugh post that did not go well:

    http://www.wargear.net/forum/showthread/474

    My definition/analysis of equilibrium is given in my very first post in that thread.


  12. #32 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Hugh, I read your post and though some of the details go over my head, I have a question.

    Would it be possible, given that a g-rating equalibrium exists for each category of game (e.g., 2-player, 3-player, 4, etc.) to find a different multiplicative constant for each of these situations that normalizes them to a 2-player paradigm?

    For instance, let's say a 3 player g-rating gets a .95 constant applied (I'm just pulling numbers out of my @@$), and a 4 player g-rating gets a .90 constant.

    Then a 1.40 3-player rating and a 1.48 4-player rating would each "normalize" to a 1.33 (2-player) g-rating.

    I'm pretty sure it's not that simple because at some point a positive rating would become a negative normalized rating.  But hey, maybe there's some truth to that.

    If there was a "reasonable" constant, sure, it might be possible to achieve 2+ scores, but certainly it would be much less likely.

    ..but we won't be happy until there is a "barren" designer feature.

  13. #33 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    M57 wrote:

    Would it be possible, given that a g-rating equalibrium exists for each category of game (e.g., 2-player, 3-player, 4, etc.) to find a different multiplicative constant for each of these situations that normalizes them to a 2-player paradigm?

    About to leave - will post an example later, but if I understand you right, this is essentially what the H rating does.  A modified G Rating of the style you suggest would give a number between 0 and 2 that is exactly twice the H Rating.  I like the mindset of the modified G Rating, because then you don't have to base it on the current rating system - you could use, as 11s suggested, existing player data to scale the different game sizes.  Then you could interpret the metric as being "true" relative to the current data set, just as H Rating is "true" relative to the current rating system.


  14. #34 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    I guess I didn't understand you: small experiments indicate that a single multiplicative constant translates very poorly. It works well around a certain pair of percentages and then quickly ceases to make any sense.

    I prefer something like what I previously proposed, which does give a correspondence between percentages from different game categories (2 player, 3 player, etc), but does not use a single scaling factor to do so.


  15. #35 / 211
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    I'm going to post some simple sample calculations in the system I proposed on page 1 of the thread. Hopefully some clarification results. To summarize the system:

    A percentage is generated in which each loss is counted as 0/1, and each win is 1/1 per player defeated in the win. It mimics the calculation of our rating system. The percentage should be thought of as converting multiplayer data into something comparable to two player win percentages.

    Example: You win 2/4 four player games. In the wins you effectively went 3/3, so overall you went 6/8, for a score of 0.75.

    The denominators are a bit strange. I refer to the 8 in that example as the number of "effective games".

    We see that my system views winning 50% of your 4-player games as being like winning 75% of your two player games. How should we interpret this?

    Suppose your four games were with opponents of Global Ratings equal to yours. Each player you beat gives you +20 points, each player you lose to subtracts 20 from your rating. In the 2/4 example, the wins gave you 120 points, the losses lost you 40 for a net of +80 points.

    Here is the point: if you play 8 effective 2-player games at a 75% win rate, you should gain +80 rating points. In duel-land this is going 6/8. That also gains you 120, also loses 40, for a net of +80.

    Does the example scale well? If I win 5/10 four player games, will that result in something as good as going 75% in duels? 5/10 converts to 15/20, so yes. In both cases (using 20 effective games), the rating gain is +200.

    Does it work with mixed game sizes? New example: Suppose you go 1/2 in 5-player games and 2/3 in two player games. Effectively, you went 4/5 and 2/3, so 6/8, yielding a score of 0.75. At 0.75 with 8 effective games, we saw that the rating should change +80. The 1/2 in five player gave you +60, and the 2/3 gave you +20, so it works. It is not hard to convince yourself that it will always work.

    We can also relax the requirement of same opponent rating (to same ratio), but the examples were easier this way. There _should_ be less skewing with this statistic, but I don't guarantee 0 skewing.


  16. #36 / 211
    Pop. 1, Est. 1981 Alpha
    Rank
    Brigadier General
    Rank Posn
    #61
    Join Date
    Dec 09
    Location
    Posts
    991

    Hugh wrote: Alpha - your suggestion looks like G Rating, but capped at 2? Right?

    Yes, but my data set was wrong, I will correct this later.

    M57 wrote: It follows that these numbers could still be inflated for those who don't play many games, but I think it would be pretty hard to whip up a good rating after 10 or so games, so I would propose that a "G"-rating isn't valid (posted), until 10 games have been played.

    This is the reason I suggested that for any stat we come up with, a player must play _____ games before it is used (otherwise it is just blank); I would suggest >25.  G-rating works this way for individual boards.  You are not assigned a g-rating until you win which is strange and possibly a bug.

    Hugh wrote: Yay! Debate, discussion, Denny's new RiskySlam intestine destroyer! (my math threads never go this well!)

    @Alpha: I like the idea of a logarithmic scale, also because it might be the right idea, but your data set exactly matched G Rating (as far as I could tell), so I don't know what logarithmic formula you are using.

     Yes, I had worked out different calculations, but apparently I copied the wrong data, will recreate what I had and try posting again, but I have decided I like the H-rating as it is.  Here are the H-ratings of some sample players:

    Alpha:         1.3948
    Hugh:          1.2927
    ASM:            1.3229
    Norseman:  1.2414
    Yertle:         1.3149

    *edit
    Oops, here are the correct H-ratings:
    Alpha:          .6052
    Hugh:           .7073
    ASM:             .6771
    Norseman:   .7586
    Yertle:          .6851
    Poloquebec: .7460
    BlackDog:     .7773
    Waldo:         .7592
    *end edit

    H-rating is really easy to calculate (+1), more meaningful than G-rating (+1), independent of game size (+1), plus other things, and is good enough for me.

    The standard deviation stat of M57 also has merit, but I think that it will be computationally difficult to maintain.  (I could certainly be wrong, but since I couldn't come up with an easy way to calculate it for myself, I decided it was too computational).

    Edited Sun 4th Jul 22:54 [history]

  17. #37 / 211
    Commander In Chief tom tom is offline now
    WarGear Admin tom
    Rank
    Commander In Chief
    Rank Posn
    #763
    Join Date
    Jun 09
    Location
    Posts
    5651

    Not much differentiation between the players though... what's a good rating and what's a bad one?


  18. #38 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Well, those are all top players. There ratings should be similar. What would mine look like?

    ..but we won't be happy until there is a "barren" designer feature.

  19. #39 / 211
    Where's the armor? Mongrel
    Rank
    Brigadier General
    Rank Posn
    #53
    Join Date
    Nov 09
    Location
    Posts
    522

    tom wrote: Not much differentiation between the players though... what's a good rating and what's a bad one?

    One that maximizes gloatability.

    Longest innings. Most deadly.

  20. #40 / 211
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5083

    Alpha wrote:

    The standard deviation stat of M57 also has merit, but I think that it will be computationally difficult to maintain.  (I could certainly be wrong, but since I couldn't come up with an easy way to calculate it for myself, I decided it was too computational).

    I was thinking that myself.  There's no recursive function to make things go smoothly.  Every time a game is finished all the numbers for those players need to be re-crunched.  But it sure would be a nice way to assess your performance.

    What if you used your H-numbers, and then did a StDev using those? There wouldn't be as many numbers to crunch, only as many as there are players, and now your not doing a StDev on expected outcome, but rather the actual output of the machine?    Does that make sense?

    ..but we won't be happy until there is a "barren" designer feature.

You need to log in to reply to this thread   Login | Join
 
Pages:   123456789»»»   (11 in total)