178 Open Daily games
0 Open Realtime games
    Pages:   123456   (6 in total)
  1. #81 / 114
    Premium Member Cona Chris
    Rank
    General
    Rank Posn
    #2
    Join Date
    Nov 10
    Location
    Posts
    213

    Is there an error in the first example?  (1799-1000 = 700 CPs)?


  2. #82 / 114
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5082

    Cona Chris wrote:

    Is there an error in the first example?  (1799-1000 = 700 CPs)?

    Probably- I will look at it and fix..

    Card Membership - putting the power of factories in your hand.

  3. #83 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    Cona Chris wrote:

    With the scale above, it will be very difficult to ever reach 2,500 pts on a lot of duel boards, as you would need to win a very high % of the time (Hugh has some formula for this I think) to get to 2,500 pts.  Are we okay with that - setting the bar almost impossibly high for some boards to get 50 CPs?

    So, to sum up - in my simple example above (which may not be what ends up happening), I can more easily get 50 CPs on a board no one plays than one of the most populat boards.  I think we want the opposite to happen?

    In equilibrium, the ratio of your rating to your opponent's is the square root of the ratio of wins to losses. (Use H-rating for multiplayer.) If you win 80% of your games against 1000 rated players, you win 4 to 1, so square root of 4 times 1000 gives 2000. In reverse, if you know the rating ratio you want to achieve, square that to get the required win ratio. So, 2500 (against 1000 rated folks) requires 6.25 to 1 win rate, or 86%. Often, the average opponent is over 1000, so the win rates need not be quite so high. Also note that many people are way below equilibrium due to how long it can take to reach.

    It's hard to quantify (in an automatic way) how difficult a board is. With no competition the only good player playing beats up on unsuspecting noobs. Lots of competition can occur on popular but not very deep boards, etc. Some duelling boards you're the best by far if you hit 75%; on others you can easily rack up 90%+ without being the best.


  4. #84 / 114
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5082

    Hugh wrote:
    Cona Chris wrote:

    With the scale above, it will be very difficult to ever reach 2,500 pts on a lot of duel boards, as you would need to win a very high % of the time (Hugh has some formula for this I think) to get to 2,500 pts.  Are we okay with that - setting the bar almost impossibly high for some boards to get 50 CPs?

    So, to sum up - in my simple example above (which may not be what ends up happening), I can more easily get 50 CPs on a board no one plays than one of the most populat boards.  I think we want the opposite to happen?

    In equilibrium, the ratio of your rating to your opponent's is the square root of the ratio of wins to losses. (Use H-rating for multiplayer.) If you win 80% of your games against 1000 rated players, you win 4 to 1, so square root of 4 times 1000 gives 2000. In reverse, if you know the rating ratio you want to achieve, square that to get the required win ratio. So, 2500 (against 1000 rated folks) requires 6.25 to 1 win rate, or 86%. Often, the average opponent is over 1000, so the win rates need not be quite so high. Also note that many people are way below equilibrium due to how long it can take to reach.

    It's hard to quantify (in an automatic way) how difficult a board is. With no competition the only good player playing beats up on unsuspecting noobs. Lots of competition can occur on popular but not very deep boards, etc. Some duelling boards you're the best by far if you hit 75%; on others you can easily rack up 90%+ without being the best.

    I'm confused, which means I'm probably wrong, but using Hugh's logic (in his first paragraph) and given a board like Invention, which has had significantly less # of plays than WGWF yet has a higher top GR score, wouldn't it be implied that WGWF is the more "difficult" board? I think Hugh's second paragraph hits on the reality of the situation. Determining 'difficulty' (whatever that is), is much too complex ..and subjective anyway.

    Anyway, moving on.. The one system that everyone seems to accept, or even just grudgingly respect in these parts is the GR. Most everyone values a CP-like concept, but it's becoming clear that very few if any are actually happy with the CP system, especially considering how it currently drives the newly devised player Ranks. In fact, that recent development has put more scrutiny on CPs and has led a number of members to dislike the CP system as a whole.  I think this is a good thing because it means that there is a growing consensus that something needs to change.

    Right now my head is swimming around in M57's proposal camp, so my opinion has become highly biased ..uhm, because I happen to be that guy :P, but that perspective obliges me to point out that the current CP system awkwardly uses the GR system, is misaligned with many of the goals that many are stating, and is about as arbitrarily structured as any we could imagine if we were tasked to create a CP-like system from scratch. As a result, a lot of the 'fixes' under consideration feel reactionary, and I think are pretty much bound to end up having that kludgy smell when the dust settles.

    Card Membership - putting the power of factories in your hand.
    Edited Sun 23rd Feb 11:53 [history]

  5. #85 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    M57 wrote:

    Anyway, moving on.. The one system that everyone seems to accept, or even just grudgingly respect in these parts is the GR. Most everyone values a CP-like concept, but it's becoming clear that very few if any are actually happy with the CP system, especially considering how it currently drives the newly devised player Ranks.

    I don't respect GR. It's a selection-biased frankenstat. But, just as the current CP system paints broad "these people are good at lots of boards" strokes, so too GR makes the broad assessment, "these people are good at some board". That's not grudging respect for either. I'm just admitting that good players will do well in one or the other category depending on how they use the site. I interpret "Rank" as having a different psychological effect on the average user than our other stats and don't think it should just be a repeat of CPs. A user could have been ignoring CPs all along, but "Rank" is very visible and hard to ignore.

    Edited Sun 23rd Feb 13:36 [history]

  6. #86 / 114
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5082

    Hugh wrote:
    M57 wrote:

    Anyway, moving on.. The one system that everyone seems to accept, or even just grudgingly respect in these parts is the GR. Most everyone values a CP-like concept, but it's becoming clear that very few if any are actually happy with the CP system, especially considering how it currently drives the newly devised player Ranks.

    I don't respect GR. It's a selection-biased frankenstat. But, just as the current CP system paints broad "these people are good at lots of boards" strokes, so too GR makes the broad assessment, "these people are good at some board". That's not grudging respect for either. I'm just admitting that good players will do well in one or the other category depending on how they use the site. I interpret "Rank" as having a different psychological effect on the average user than our other stats and don't think it should just be a repeat of CPs. A user could have been ignoring CPs all along, but "Rank" is very visible and hard to ignore.

    @Hugh, when you say "Rank," are you referring to our new Ranking system, or are you referring to the GR related point system?

    If you had your druthers, would you scrap the whole thing and start over?  Are there any stats on this site you can respect? Are there stats or systems you feel we should instead be using?  I am no statistician, so I'm bound to mangle the language, but I see the GR related point system as a type resembling the systems used in chess, etc.   Assuming equilibrium, it for the most part is a reasonable indicator of a player's ability to beat another playing in that game on that board.  It's the dice that are fickle; the number will fluctuate (although moving average would go a long way toward mitigating that), but I don't see how it's a Frankenstat when viewed as an indicator of a player's strength on a given board.

    Card Membership - putting the power of factories in your hand.

  7. #87 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    M57 wrote:

    It's the dice that are fickle; the number will fluctuate (although moving average would go a long way toward mitigating that), but I don't see how it's a Frankenstat when viewed as an indicator of a player's strength on a given board.

    Are you referring to an individual's board as GR, or are you referring to the thing on your profile labelled Global Ranking Score? Your last sentence makes it sound like you are confusing a player's individual board ranking with what we all call GR. (GR is computed across all boards, which is why it is a frankenstat!)

    What makes GR so goofy is that different game types and game sizes influence your score. It has nothing to do with the dice. Game selection affects your GR, plain and simple. That's it. You can (and people actually do) choose game types that are better for their GR. And it has nothing to do with being better at one game versus another. Best Spy v Spy player? If you care about your GR you shouldn't play it because you know you can do better on another board. That's not an indication of strength; it's an indication of selection.

    I'm going to walk away from this one. I've already indicated many times before that I think we should be using Glicko or Elo or Trueskill at the level of individual board. But this isn't about that, so I'm off-topic. But, I saw the words respect and GR and everyone used in the same sentence and I couldn't resist replying :)


  8. #88 / 114
    Shelley, not Moore Ozyman
    Rank
    Brigadier General
    Rank Posn
    #40
    Join Date
    Nov 09
    Location
    Posts
    3448

    yeah, if we are going to be basing CP on GR, it makes sense to first improve GR. My vote is for some kind of trueskill type measurement.


  9. #89 / 114
    Standard Member ratsy
    Rank
    Brigadier General
    Rank Posn
    #65
    Join Date
    Jul 10
    Location
    Posts
    1274

    Can someone give us the 2 sentance summary of what trueskill does/is?

    "I shall pass this but once, any good I can do, or kindness I can show; let me do it now. Let me not difer nor neglect it, for I shall not pass this way again." -Stephen Grellet

  10. #90 / 114
    Shelley, not Moore Ozyman
    Rank
    Brigadier General
    Rank Posn
    #40
    Join Date
    Nov 09
    Location
    Posts
    3448

    The wikipedia article is pretty good:

    http://en.wikipedia.org/wiki/Trueskill

    A player's skill is represented as a normal distribution 61313b55475d28310e35ddcee6e98242.png characterized by a mean value of b72bb92668acc30b4474caff40274044.png (mu, representing perceived skill) and a variance of 9d43cb8bbcb702e9d5943de477f099e2.png (sigma, representing how much "confidence" the system has in the player's b72bb92668acc30b4474caff40274044.png value). 

     

    I'm not exactly sure how the math would be applied for WarGear.  I'll leave that Hugh or someone else with more math smarts than me.


  11. #91 / 114
    Standard Member ratsy
    Rank
    Brigadier General
    Rank Posn
    #65
    Join Date
    Jul 10
    Location
    Posts
    1274

    This is something like the roving H-rating that M57 is on about.  In the sense that it is self correcting and based on near in time localized performance.

    "I shall pass this but once, any good I can do, or kindness I can show; let me do it now. Let me not difer nor neglect it, for I shall not pass this way again." -Stephen Grellet

  12. #92 / 114
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5082

    ratsy wrote:

    This is something like the roving H-rating that M57 is on about.  In the sense that it is self correcting and based on near in time localized performance.

    I'm pretty sure not.  I think these are two different things.  I believe you're referring to a Moving Average, which likely could be applied to the system they are describing above.

    Card Membership - putting the power of factories in your hand.

  13. #93 / 114
    Standard Member AttilaTheHun
    Rank
    Major General
    Rank Posn
    #16
    Join Date
    Sep 10
    Location
    Posts
    941

    From reading about Trueskill I'm a bit worried when it says your score will suffer huge changes from any "unexpected" result. The boards on this site being primarily based on luck, this could happen way more than normal. I already think the system favors non-luck-based boards and this will drive that even more.

    "If an incompetent chieftain is removed, seldom do we appoint his highest-ranking subordinate to his place" - Attila the Hun

  14. #94 / 114
    Brigadier General M57 M57 is offline now
    Standard Member M57
    Rank
    Brigadier General
    Rank Posn
    #73
    Join Date
    Apr 10
    Location
    Posts
    5082

    Hugh wrote:
    M57 wrote:

    It's the dice that are fickle; the number will fluctuate (although moving average would go a long way toward mitigating that), but I don't see how it's a Frankenstat when viewed as an indicator of a player's strength on a given board.

    Are you referring to an individual's board as GR, or are you referring to the thing on your profile labelled Global Ranking Score? Your last sentence makes it sound like you are confusing a player's individual board ranking with what we all call GR. (GR is computed across all boards, which is why it is a frankenstat!)

    What makes GR so goofy is that different game types and game sizes influence your score. It has nothing to do with the dice. Game selection affects your GR, plain and simple. That's it. You can (and people actually do) choose game types that are better for their GR. And it has nothing to do with being better at one game versus another. Best Spy v Spy player? If you care about your GR you shouldn't play it because you know you can do better on another board. That's not an indication of strength; it's an indication of selection.

    I'm going to walk away from this one. I've already indicated many times before that I think we should be using Glicko or Elo or Trueskill at the level of individual board. But this isn't about that, so I'm off-topic. But, I saw the words respect and GR and everyone used in the same sentence and I couldn't resist replying :)

    Thanks Hugh  - I don't think I am confusing them - I think the nomenclature we use is confusing. The work Rank means a lot of things on this site these days and I still don't know what to call the individual board related stat you get for winning a game.  ..Individual Board Score? IBS?

    I highly value your opinion on matter such as these.  You are one of the top math guys on this site. So if you think the IBS system needs an overhaul, then I'm inclined to agree.  CPs are broken, (can we agree on that?), and if we're considering re-building them, we need to make sure they are built on a strong foundation.

    Card Membership - putting the power of factories in your hand.
    Edited Mon 24th Feb 07:09 [history]

  15. #95 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    M57 wrote:

    I highly value your opinion on matter such as these.  You are one of the top math guys on this site. So if you think the IBS system needs an overhaul, then I'm inclined to agree.  CPs are broken, (can we agree on that?), and if we're considering re-building them, we need to make sure they are built on a strong foundation.

    I appreciate that. We had a thread probably a year or so ago about at least looking at what these systems are like. Among the math/stats people on the site, enthusiasm and ability to commit time varies. For a brief moment I thought we might get access to the win/loss data and start doing some serious analysis, but this appears to be a technical hurdle. (ie We never got the data!)

    For a long time I've thought this is something we should be serious about. But, it would require serious effort, not least of which would be demonstrating the difference between industrial-strength ranking algorithms and the one we use.


  16. #96 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    AttilaTheHun wrote: From reading about Trueskill I'm a bit worried when it says your score will suffer huge changes from any "unexpected" result. The boards on this site being primarily based on luck, this could happen way more than normal. I already think the system favors non-luck-based boards and this will drive that even more.

    This is partially unfortunate wording on the part of the article. (It's standard stat-speak to talk about a low probability outcome as "unexpected".) ALL algorithms are sensitive to "unexpected" results. This is nothing more than the statement that if a lower ranked beats a higher ranked, the swing in rating is greater.

    These algorithms are not just for games of no luck. They have been used in luck-based settings. (Yahoo games used to use Glicko.) Regardless of the algorithm, luck-based games aren't really a problem. At worse, it just means wider variance. Any criticism about luck against an industrial strength model could be made against our own algorithm because they both use the same basic idea: Upset (due to luck or cheating or whatever) means bigger rating swing.

    These algorithms are well-designed and well-tested. One way to see the difference between algorithms is to use actual given data as the basis of a simulation. (i.e. Someone wins at a certain rate against certain people, simulate the results into the future using a pseudo-random number generator.) Using simulations, you can test, for example, whether games of one size are favorable to play. A good system would be immune to such things. Simulations are a good way to test for selection biases.

    Edited Mon 24th Feb 08:18 [history]

  17. #97 / 114
    Standard Member Hugh
    Rank
    Lieutenant General
    Rank Posn
    #13
    Join Date
    Nov 09
    Location
    Posts
    869

    But, and btilly pointed this out long ago: Glicko and Trueskill store a variance measure. A player's variance depends upon how many games they've played. (In other words, how confident we are in the rating based on amount of performance data.)

    Once a player has played a lot, the rating varies less. So actually, these rating systems are FAR LESS erratic due to luck swings than our own system. Again, this can be demonstrated via simulation.


  18. #98 / 114
    Standard Member ratsy
    Rank
    Brigadier General
    Rank Posn
    #65
    Join Date
    Jul 10
    Location
    Posts
    1274

    Whats the downside to using trueskill?

    "I shall pass this but once, any good I can do, or kindness I can show; let me do it now. Let me not difer nor neglect it, for I shall not pass this way again." -Stephen Grellet

  19. #99 / 114
    Standard Member itsnotatumor
    Rank
    Lieutenant General
    Rank Posn
    #14
    Join Date
    Jul 12
    Location
    Posts
    634

    Questions:

    What of our current stats would true skill replace?

    How easy would it be to institute such a change over?

    How much tougher than doing the idea of having everything over a thousand count as CP points?  

    How much more accurate would this be compared to the idea of counting everything over 1,000 as part of CP?  That would seem to in part address the issue of game selection, because it would count all boards equally.  

    Though that brings up the question of how tough it would be for Tom to make any of the proposed solutions to the CP.  

    Fortune favors the bold, and chance favors the prepared mind...

  20. #100 / 114
    Shelley, not Moore Ozyman
    Rank
    Brigadier General
    Rank Posn
    #40
    Join Date
    Nov 09
    Location
    Posts
    3448

    >What of our current stats would true skill replace?

    Board GR.  I think global GR too.  Hugh can speak to this better.

    >How easy would it be to institute such a change over?

    Really a question for Tom, but I don't get the impression that the algorithm is that difficult.  

    >How much tougher than doing the idea of having everything over a thousand count as CP points?

    This would not be instead of, but maybe in addition to that change.  You'd still have a board GR, it would just be determined by TrueSkill instead of the current method.  Then you could apply that board GR to get CP the similarly to the current proposal.

     

    >How much more accurate would this be compared to the idea of counting everything over 1,000 as part of CP?  That would seem to in part address the issue of game selection, because it would count all boards equally.  

    I think I've addressed this, since it would just be a different way of calculating board GR, you could still do the sum(board GR - starting board GR) as the CP.


You need to log in to reply to this thread   Login | Join
 
Pages:   123456   (6 in total)