229 Open Daily games
1 Open Realtime game
    Pages:   1234   (4 in total)
  1. #21 / 74
    Standard Member SquintGnome
    Rank
    Brigadier General
    Rank Posn
    #35
    Join Date
    Jun 11
    Location
    Posts
    546

    I agree that a 'small cap' means that those who play few games MAY have a slower path to equilibrium, but the system we are using requires playing many games to get accurate ratings - this should be understood and accepted.  Penalizing active players to mollify impatient sporadic players does not seem to be the right way to encourage long term game play.

    More importantly though, in order for a talented newbie, ranked at 1000, to have his path to equilibrium slowed, he would need to defeat a player rated higher than 2000 to lose points he would otherwise get without a cap.  This would be an uncommon occurence and therefore not substantial enough to merit preventing implementation of a cap in my opionion.


  2. #22 / 74
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    Lots of old threads being resurrected:

    http://www.wargear.net/forum/showthread/2406p1/Points_Cap

    Within the system, a cap should lead to inflation, larger equilibrium ratings, and hence much longer time to equilibrium and accuracy. An interesting experiment to do with the win/loss data would be to first just run it with the caps. Then, pretend the win/loss history repeats itself 100 times over. Compare caps to non-caps (and compare to present day to show how far away people are from their "real" ratings!!!) I think it would be very illuminating.


  3. #23 / 74
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    "the system we are using requires playing many games to get accurate ratings - this should be understood and accepted."

    In any system, you need to play some amount of games to get an accurate rating. This one takes much much longer than others (slower than even Elo which is automatically capped). I've seen scenarios where I'm not sure that even a broad stroke of accuracy is being painted. In many situations, I do think we see something that is roughly correct, which contributes to the illusion that this is a satisfactory calculation for any board.

    A player doesn't have to "understand and accept" the long path to accuracy when there are well-established alternative systems that achieve better accuracy in a shorter amount of time.

    Edited Sun 26th May 12:48 [history]

  4. #24 / 74
    Standard Member SquintGnome
    Rank
    Brigadier General
    Rank Posn
    #35
    Join Date
    Jun 11
    Location
    Posts
    546

    Agreed Hugh. ....in this case though i am taking the perspective that we will not be changing the ranking system and only tweaks will be possible, like a cap. So i am tryi.g to make the best out of the given constraints


  5. #25 / 74
    Standard Member btilly
    Rank
    Colonel
    Rank Posn
    #85
    Join Date
    Jan 12
    Location
    Posts
    294

    I got curious, so I ran a simple simulation.

    The rules for my simulation were that each player has some level of skill.  In a game, your odds of winning are directly proportional to your skill.  (So if I have skill 2, and you skill 1, in a heads up game I win 2/3 of the time.)  My levels of skill were 1.3^1, 1.3^2, ... , 1.3^10 (so each is 30% better than the next).  I had 100 of each.

    I ran the simulation to 5 million games multiple times, with games of different sizes.  Here is what I found.  A 30% skill difference is consistently around 150 rating points.  If you're playing 5 player games, the standard deviation of your rating is about 150.  When you change the size of the games, the standard deviation changes but the skill/rating correlation does not.  So, for instance, 2 player games lead to a deviation around 110, and 10 player games lead to one in the neighborhood of 240.  (None of these figures are very precise, but they are "close enough for government work".)

    What does this mean?  A rating difference of 150 points is about a 30% edge.  How accurate your rating is depends on the size of games that you play.  But if you play 5 player a lot, then you will be found a lot of the time within +-150 points of your "true rating", and 95% of the time you'll be within +-300 points.

    My simulation is, of course, fake.  However that level of variability does fit with my experience.


  6. #26 / 74
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    btilly: I don't know if you're familiar with Elo, but the model is set up so that a probability of winning corresponds to a rating difference. In the usual setup, a 200 point rating difference corresponds to a 75% win probability (in a 2-player game) for the better player.

    Our system doesn't have this property. If my calculations are right (always worth checking), the skill difference between a 2400 and 2000 player should be different than that between a 1400 and 1000 player using the site's rating calculation.


  7. #27 / 74
    Standard Member btilly
    Rank
    Colonel
    Rank Posn
    #85
    Join Date
    Jan 12
    Location
    Posts
    294

    I am familiar with Elo.  It is a good system, but not perfect.  It makes bell curve assumptions about player performance which are untrue in chess, and are even less likely to be true on the various boards here.  (There are some boards with a large enough chance element that it is mathematically impossible for a 600 point difference to mean what it means in the Elo system.)  And are even less likely to be true for multiplayer games.  But without some sort of assumptions like that, you can't get statistical properties like the ones you describe.

    In my simulation those properties did not hold.  Even before you consider how much rating volatility there was.  Also it was more complex than I wanted to address theoretically.  So I just tried to describe what I found.

    While the 150 points vs 30% improvement cannot be exact over a broad range, it still held well enough in my simulation to be a useful description.


  8. #28 / 74
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    If I get a chance, I'll send you my analysis on predicting the probabilities over the long run. I did simulations long long ago, and I don't remember what happened. I'm curious how well the predictions match your simulation.


  9. #29 / 74
    Commander In Chief tom tom is offline now
    WarGear Admin tom
    Rank
    Commander In Chief
    Rank Posn
    #762
    Join Date
    Jun 09
    Location
    Posts
    5651

    I'm not against any ranking system change - if we can find something better which I can work out how to implement then I'm all for it.

    My preferred approach is to parallel run any promising system for a while so everyone can gauge effectiveness. If the majority think any new system is better it can be easily switched over.


  10. #30 / 74
    Standard Member SquintGnome
    Rank
    Brigadier General
    Rank Posn
    #35
    Join Date
    Jun 11
    Location
    Posts
    546

    I apologize if someone mentioned this above, but in addition, or before, running a new system in parallel as Tom mentions.  We could use the existing data to develop rankings from the proposed system and compare them to the current ratings.  That way we would not need to wait for a conclusion.

    Hugh, do you think the system you mentioned could be implemented here?


  11. #31 / 74
    Standard Member btilly
    Rank
    Colonel
    Rank Posn
    #85
    Join Date
    Jan 12
    Location
    Posts
    294

    tom wrote:

    I'm not against any ranking system change - if we can find something better which I can work out how to implement then I'm all for it.

    My preferred approach is to parallel run any promising system for a while so everyone can gauge effectiveness. If the majority think any new system is better it can be easily switched over.

    Do you have in easily extractable form the public win/loss records of everyone on the site along with time ordering?

    If you can make that available as a dump somewhere, then other people could potentially implement multiple ideas before offering up ratings to everyone.  And you'd only need to think about how to implement a system after people had looked at it and you had some confidence that people probably would like it better.


  12. #32 / 74
    Enginerd weathertop
    Rank
    Brigadier General
    Rank Posn
    #64
    Join Date
    Nov 09
    Location
    Posts
    3020

    not a bad idea, if it could be dumped.

    I'm a man.
    But I can change,
    if I have to,
    I guess...

  13. #33 / 74
    Commander In Chief tom tom is offline now
    WarGear Admin tom
    Rank
    Commander In Chief
    Rank Posn
    #762
    Join Date
    Jun 09
    Location
    Posts
    5651

    btilly wrote:

    Do you have in easily extractable form the public win/loss records of everyone on the site along with time ordering?

    No... but it's doable if it's really needed. There's 240,000 games with of data so it will be quite sizable!


  14. #34 / 74
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    SG: It can be done. The calculations tend to be not much more complicated than what we do now.

    tom, I'm glad you're open to this sort of thing. It would be really helpful for making the case and experimenting to have that data. Even a fraction of the data would be good, but I have a feeling it's just as hard to pull a fraction versus the whole thing :)

    Out of curiosity, how is it currently stored?

     


  15. #35 / 74
    Standard Member itsnotatumor
    Rank
    Lieutenant General
    Rank Posn
    #14
    Join Date
    Jul 12
    Location
    Posts
    634

    Toto wrote:
    M57 wrote:
    Hugh wrote:

     Most locals are happy with the current calculation,..

    Not me.  Unfortunately, Global Rankings seem pretty firmly entrenched.   I would prefer to see something that is less volatile, yet indicative of current ability.

    Not me neither. It has been discussed many times without any success, but a 40 point cap (instead of a crazy 100) and a moving average would be great improvments.

    Correct me if I'm wrong, but I believe the cap only applies to losses which would virtually never affect a new player since only higher ranked players regularly take losses of 40+.

    I would like to know if there is a specific purpose to the 100 point cap, or if it was arbitrarily chosen.  I it was a arbitrary choice I would support lowering it.

    As it stands I see it as a disincentive to play certain maps or players. Every time I get in the upper teens/low 20's in the rankings I can get point swings of 300+ a day.  I need to win 2-4 games just to make up for 1 loss. I noticed that I started looking carefully at who's in a game and skip it if there's too many low ranked players.  It also seems that high ranked players tend to play less games. Is this the reason, or do people just cut back overtime?

     

    Fortune favors the bold, and chance favors the prepared mind...

  16. #36 / 74
    Premium Member Yertle
    Rank
    Major General
    Rank Posn
    #21
    Join Date
    Nov 09
    Location
    Posts
    3997

    itsnotatumor wrote:

    I would like to know if there is a specific purpose to the 100 point cap, or if it was arbitrarily chosen.  I it was a arbitrary choice I would support lowering it.

    If I recall correctly, it was pretty much arbitrarily chosen and/or a pull from the previous Risk site that many of us came from.

    If I could figure out how to draw a line in Photoshop I would be a lot more well off with the Mac thing...

  17. #37 / 74
    Standard Member btilly
    Rank
    Colonel
    Rank Posn
    #85
    Join Date
    Jan 12
    Location
    Posts
    294

    itsnotatumor wrote:
    As it stands I see it as a disincentive to play certain maps or players. Every time I get in the upper teens/low 20's in the rankings I can get point swings of 300+ a day.  I need to win 2-4 games just to make up for 1 loss. I noticed that I started looking carefully at who's in a game and skip it if there's too many low ranked players.  It also seems that high ranked players tend to play less games. Is this the reason, or do people just cut back overtime?

    The key issue that I see revealed in your reply is that there is a natural tendency to get protective of your rating when you've got a high rating.  But we all hit our high ratings when we've been lucky - which means that continuing to play only hurts the rating.

    Secondly your rating is affected by who you play.  It shouldn't, but it does.

    With that in mind, here are my criteria for an impossibly ideal rating system:

    1. Your rating reflects your true strength.
    2. Your rating converges fairly quickly, then stabilizes, with an opportunity to change more slowly over time.
    3. In the long run, your rating will be the same no matter who you play.
    4. A specific rating gap reflects a specific advantage over another person in a game.  (For instance if your rating is 50 points higher, you might win 10% more often.)

    Why is this impossible?  Very simply, because there are lots of style matchups where A beats B beats C beats A again.  Which of these gets a higher rating?  Worse yet, A beats B on board X but B beats A on board Y.  Which of these gets a higher rating?

    However the fact that it is impossible does not stop us from trying!  What we can do is invent theories about strength, and how that works in real games.  Based on that theory we can create a rating system that has all of these properties.  Then we can look at real games and ask how well it works.

    (And when I say "we", I mean "Hugh"...)


  18. #38 / 74
    Standard Member SquintGnome
    Rank
    Brigadier General
    Rank Posn
    #35
    Join Date
    Jun 11
    Location
    Posts
    546

    itsnotatumor wrote:

    As it stands I see it as a disincentive to play certain maps or players. Every time I get in the upper teens/low 20's in the rankings I can get point swings of 300+ a day.  I need to win 2-4 games just to make up for 1 loss. I noticed that I started looking carefully at who's in a game and skip it if there's too many low ranked players.  It also seems that high ranked players tend to play less games. Is this the reason, or do people just cut back overtime?

     
    +1


  19. #39 / 74
    Standard Member AttilaTheHun
    Rank
    Major General
    Rank Posn
    #16
    Join Date
    Sep 10
    Location
    Posts
    941

    Any rating system will have difficulty because most of these games are luck based.  So even if you create a system that's ideal, the tendencies of top rated players will still be to avoid luck-based games against lower-ranked players.  Chess relies less on luck and so methinks would be a bit easier to design an ideal ranking system.

    "If an incompetent chieftain is removed, seldom do we appoint his highest-ranking subordinate to his place" - Attila the Hun

  20. #40 / 74
    Standard Member btilly
    Rank
    Colonel
    Rank Posn
    #85
    Join Date
    Jan 12
    Location
    Posts
    294

    AttilaTheHun wrote:

    Any rating system will have difficulty because most of these games are luck based.  So even if you create a system that's ideal, the tendencies of top rated players will still be to avoid luck-based games against lower-ranked players.  Chess relies less on luck and so methinks would be a bit easier to design an ideal ranking system.

    That is why point #3 in my list matters.  In a straight skill-based game you can reasonably make the odds of beating a significantly better player essentially zero.  But - exactly because of the role of luck - the fact that a low-ranked player beat a high-ranked player in one game should not be taken here as definitive proof that the high-ranked player is massively worse than you thought, and the low-ranked player is massively better.  In the long run you want your rating to converge to your true skill, even with the influence of luck.

    Which is why I'd personally look at the Glicko system for inspiration, not as a rating system to directly implement.  (Though it has been implemented successfully for backgammon, which is a luck-based game.)  But all of this is general conjecture until we have a dump and a few attempts to process it to assign ratings.


You need to log in to reply to this thread   Login | Join
 
Pages:   1234   (4 in total)