WarGear / Forum - Viewing thread: G Rating pros/cons, mostly cons

184	Open Daily games
0	Open Realtime games

Wiki

Home » Forum List » General Discussion » View Thread

Pages: 12 3 4 5 6 7 8 9 »»» (11 in total)

Thu 1st Jul 2010 18:05 #1 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

I'll start with a pro: It is a fact that if your win% for 3-player games is > 1/3, 4-player win % is > 1/4, and so on, your G Rating is > 1. Generally, we should expect players with high win percentages to have higher G Ratings. The description from the help file is remarkably accurate:

"G rating is a normalized rating based on how many games you expect to win on a game of a particular size." _of a particular size_ is required for the stat to maintain its meaning.

However, the calculation of the G rating mixes games of different sizes and here it goes very wrong. Winning 100% of your duels would be truly remarkable (from a percentage standpoint), yet it is equivalent to winning only 40% of your 5 player games, which many players achieve. 100% two player and 40% five player both yield a 2.0 G Rating.

I think the a priori criticisms are too easy, so I decided to see what the different the skewings among the top 10 are if you delete their two player games. In this data set, EVERY SINGLE PLAYER'S G RATING WENT UP AFTER DELETING THEIR TWO PLAYER GAMES!!!!

The skewings: +.13, +.17, +.19, +.12, +.47, +.23, +.32, +.09, +.07, +.38

The large plusses usually occurred with players who play a high proportion of duels.

It was suggested that over time the skewing should go to zero. In fact, it will not if over time a player maintains the same proportion of duels being played. Of course it is not unique to duels. I confidently conjecture that if we deleted 6-10 player games from (good) players who play a lot of those games, that their G Rating would drop.
Thu 1st Jul 2010 18:09 #2 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

The intent of the G Rating was well expressed by BlackDog: A low G Rating with a high ranking means that I generally play higher rated players, while a high G Rating with a high ranking means that I play lower rated players.

I think this is the true intent of the G Rating: Attach a numerical value, something like a percentage, that ignores the difficulty of the opposition and yet makes sense across different sized games.

I have a real hard time saying what this _truly_ should be, but I do have a suggestion (to be posted later), that has predictive value within our current ranking system.
Thu 1st Jul 2010 19:56 #3 / 211

IRoll11s
They see me rollin'

Rank

Private

Rank Posn

#1534

Join Date

Nov 09

Location

Posts

632

I sort of agree with you, let me see if I can restate it and you tell me if I understand you objection.

The G-rating is there to give a 'normalized' rating with regards to the number of players in a game, and should give a value indicative of how much better a given person is doing compared to how many they 'should' win if all players were dumb automatons who rolled randomly. Thus winning 50% of 2-player games is treated the same as winning 1/16th of a 16 player games.

While that is true, your objection is based upon the fact that an elite player will win proportionally more games as the number of players are increased, which I believe is true. Winning 1/8th of 16 player games is easily doable, while winning 100% of your duels is impossible. The reasons for this are many, some are more obvious than others, and ultimately the reasons don't matter.

Since players are ranked in part by G-rating, it 'punishes' those players who prefer duels and inflates the rankings of those players who prefer large games.

My first instinct would be to attempt to quantify the rate of increase per player added to a game in terms of what the elite players can actually accomplish. I'm thinking along the lines of averaging the actual winning percentage per players/game of the top X G-rated players on the site. From this you could normalize the G-rating to obtain a G-prime rating, which would serve to eliminate the skewings you mentioned.

Where he keeps his liquor is a secret still.
Thu 1st Jul 2010 20:04 #4 / 211

M57
Standard Member

Rank

Brigadier General

Rank Posn

#73

Join Date

Apr 10

Location

Posts

5083

I agree with 11s. At the very least, where the skewing comes into play is for those top of the line players who can better than double the expected win ratio in games with more than two players. This means you have to win better than 66% of your matches in three player games to break the system. In cases like this, playing only 2 player games would hurt their g-ratings. But even 11's post suggests that it's a bit more pervasive than that.

So what if we topped out the G rating at 2?

Consider for simplicity's sake the kind of player that can win 75% of their two-player games. Could we say this is half-way between average and perfection? Let's give this excellent player a G rating of 1.5 (half-way between 1 and 2.)

Now letâ€™s consider that all things being equal, the same (half way to perfection kind of guy) should win not 10%, but rather 55% of all ten player games (I.e., somewhere around 11 out of every 20 games played).

Also consider the not-so-good 25% half-way-to-total-loser player who will win 1 out of 20 ten player games (1/2 the expected). His G rating should be .5 for that pathetic effort.

But how do you accomplish this? What would the math look like? What about using 11â€™s idea and somehow creating a straight percentage score where 50% would be expected and 100% would be perfect?

..but we won't be happy until there is a "barren" designer feature.

Edited Thu 1st Jul 20:05 [history]
Thu 1st Jul 2010 23:23 #5 / 211

Yertle
Premium Member

Rank

Major General

Rank Posn

#21

Join Date

Nov 09

Location

Posts

3997

Hugh wrote:
The description from the help file is remarkably accurate:

"G rating is a normalized rating based on how many games you expect to win on a game of a particular size." _of a particular size_ is required for the stat to maintain its meaning.

I'm pretty certain I stole that from tom ${#emotions_dlg.razz}$
Fri 2nd Jul 2010 00:43 #6 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

IRoll11s wrote:

While that is true, your objection is based upon the fact that an elite player will win proportionally more games as the number of players are increased, which I believe is true. Winning 1/8th of 16 player games is easily doable, while winning 100% of your duels is impossible. The reasons for this are many, some are more obvious than others, and ultimately the reasons don't matter.

Since players are ranked in part by G-rating, it 'punishes' those players who prefer duels and inflates the rankings of those players who prefer large games.

I'm glad you're here to help me clarify :) I'm stating something stronger. I believe the effect you speak of is real - whatever equivalent performance across game sizes is defined to be, elite players should perform better in games with more players. I believe G Rating to be more broken than that.

G Ratings for particular game sizes compare perfectly well across that game size. I contend that, except around the 1.0 mark, the G Rating does a poor job of comparing performances across game sizes.

It is easy to construct toy examples: Here is one - suppose a player wins 3 consecutive two player games. They have played 3 games, their G Rating is 2.0. As a performance, this is good, 1/8 is the probability of doing that if it's just coin flips. A different player plays an 8 player game and wins. 1/8 is also the probability of hitting that performance randomly. Now the 2nd player plays two more 8 player games (losing both), so that the weighting is the same as the first player and due to the losses, the 2nd player has performed worse. The first player has 2.0 weighted at 3 games, the second player has 2.67 weighted at 3 games, yet the first player clearly outperformed the second player.

It does a poor job across different game sizes is the summary of what I am saying. Examples similar to the above should be constructible even if we restricted to 4-player vs 5-player.
Fri 2nd Jul 2010 00:53 #7 / 211

Yertle
Premium Member

Rank

Major General

Rank Posn

#21

Join Date

Nov 09

Location

Posts

3997

Hugh wrote:
It is easy to construct toy examples: Here is one - suppose a player wins 3 consecutive two player games. They have played 3 games, their G Rating is 2.0. As a performance, this is good, 1/8 is the probability of doing that if it's just coin flips. A different player plays an 8 player game and wins. 1/8 is also the probability of hitting that performance randomly. Now the 2nd player plays two more 8 player games (losing both), so that the weighting is the same as the first player and due to the losses, the 2nd player has performed worse. The first player has 2.0 weighted at 3 games, the second player has 2.67 weighted at 3 games, yet the first player clearly outperformed the second player.

I would say there could be an argument against "the first player clearly outperformed the second player". Player 1 defeated 6 players in the 3 games he played, while player 2 defeated 7 players in the one game he won and then played against another 14 that he lost to. There are times where winning an 8 player game is vastly more difficult than winning a 3 player game (and the same could potentially be said for the opposite, but with the 8 player game you do have "less" of a chance of winning).
Fri 2nd Jul 2010 01:13 #8 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

I'll concede a point, but defend first. Random move making automata have a 1/8 probability of winning three consecutive two player games. Random move making automata have a 1/8 probability of winning a single 8-player game. There is a symmetry involved in an 8-player game - it isn't any harder for one player versus another (until you factor in skill starting position and those things).

The point I'll concede is that I used uniform probabilities as my point of comparison. I don't have a great answer to how best to compare across game sizes. I use uniform probabilities because I don't know where else to turn for comparison. However, no matter what the metric, a person winning 50/100 of their 5-player games for a G Rating of 2.5 has not outperformed a player who has won 1000/1000 two player games for a G Rating of 2.0.
Fri 2nd Jul 2010 02:28 #9 / 211

asm
Standard Member

Rank

Major General

Rank Posn

#19

Join Date

Nov 09

Location

Posts

1686

I love this thread a lot.

IF YOU ARE SUGGESTING ASM IS A GOOD PLAYER YOU WILL STOP NOW, OR I WILL CALL HR AND I WILL PUT AN END TO IT, FOR THAT IS WHAT IT SOUNDS LIKE TO ME.
Fri 2nd Jul 2010 02:35 #10 / 211

Alpha
Pop. 1, Est. 1981

Rank

Brigadier General

Rank Posn

#61

Join Date

Dec 09

Location

Posts

991

Hugh wrote:

It was suggested that over time the skewing should go to zero. In fact, it will not if over time a player maintains the same proportion of duels being played. Of course it is not unique to duels. I confidently conjecture that if we deleted 6-10 player games from (good) players who play a lot of those games, that their G Rating would drop.

I did suggest something similar (not zero), but concede that the skewing will not go away (maybe this is the reason for my comparatively low g-rating - lots of spies).

I meant to say and didn't that if players play a lot of games of all sizes then the skewing is incorporated and not as important, but certainly a good player that only play games with a lot of players will have a higher g-rating than a good player who only plays dueling maps. (This is well explained by Hugh above).

Here my solution, although it is not easy to implement: Give each player a number of points on a somewhat logarithmic scale.

For two player games (50% wins is expected) so:
Win Percentage Points
<6.25%               0
6.25%                 .125             (12.5% of expected win percentage)
12.5%                 .25               (25% of expected win percentage)
25%                    .5                 (50% of expected win percentage)
50%                    1                  (expected)
75%                    1.5               (150% of expected win percentage)
82.5%                 1.75             (175% of expected win percentage)
88.75%               1.875           (187.5% of expected win percentage)
>88.75%             2

For three player games (33% is expected) so:
Win Percentage Points
< 4.125%            0
4.125%               .125
8.25%                 .25
16.5%                 .5
33%                    1
49.5%                 1.5
57.75%               1.75
61.875%             1.875
> 61.875%          2

For anyone who understands what is above, this should be averagable to give a meaningful stat as long as you ignore cases of "played one 10 player game and won" (set minimum number of games of player size to be ___ before used in stat).

If I am wrong with what I said above, please let me know.
Fri 2nd Jul 2010 02:49 #11 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

I did mention a suggestion that works towards the goal of having meaning within our ranking system. It is actually a very mild variant of G Rating. It is this:

Each loss is 0/1, but each win is regarded as going 1/1 against each opponent you beat. So, 1/1 in two player wins, 2/2 in three player wins, 3/3 in four player wins etc. We'll refer to the denominator as "effective games" (instead of total games). This may seem strange, but it corresponds precisely to how our rating system is structured.

Does it satisfy the property everyone loved about G Ratings? Yes - if you win 1/4 of your 4-player games, your score is 0.5, 1/5 of your 5-player games is also 0.5. Less gives a number under 0.5, more gives a number greater than 0.5.

It maxes out at 1.0 regardless of game size.

Lastly, equal scores with different game sizes corresponds to equal rating gains. A 0.6 score achieved solely with 4-player games has the same rating gain as a 0.6 score achieved solely with 2-player games provided the same number of "effective games" were played and the opponents were of equal rating to the player. Thus, if two players had the same Global Rating, we could tell who had the higher rated opponents by looking at this score. G rating does not share this property due to the game size skewing.

Disclaimer: The way in which this system compares performances across game sizes may not reflect "the truth" about which was better, but it is compatible with our current rating system.
Fri 2nd Jul 2010 03:05 #12 / 211

Hugh
Standard Member

Rank

Lieutenant General

Rank Posn

#13

Join Date

Nov 09

Location

Posts

869

Alpha - your suggestion looks like G Rating, but capped at 2? Right?
Fri 2nd Jul 2010 05:24 #13 / 211

IRoll11s
They see me rollin'

Rank

Private

Rank Posn

#1534

Join Date

Nov 09

Location

Posts

632

Testing by looking at edge cases confirms that this is a good metric. To wit:

Dumb-monkey expected wins:

game size - wins/losses - avg. - g-rating - h-rating
2 - 1/2 - 50.0% - 1 - .5 (1/2)
4 - 1/4 - 25.0% - 1 - .5 (3/6)
8 - 1/8 - 12.5% - 1 - .5 (7/14)

God of war wins:

game size - wins/losses - avg. - g-rating - h-rating
2 - 2/2 - 100% - 2 - 1 (2/2)
4 - 4/4 - 100% - 4 - 1 (12/12)
8 - 8/8 - 100% - 8 - 1 (49/49)

Average is a poor metric that cannot be combined with games of various sizes unless someone wins all or none of their games, which is why G-rating was established. However G-rating can only be accurately combined as a comparison when someone wins exactly as many dumb-monkey games as expected (G-rating = 1) and starts to skew badly both up and down the skill scale.

Once again I'm only clarifying for you Hugh. Alpha your log scale has merit but I believe this is more precise. The only part of Hugh's post I didn't understand was this:

"...and the opponents were of equal rating to the player. Thus, if two players had the same Global Rating, we could tell who had the higher rated opponents by looking at this score."

This might be because I really have no interest in rankings other than a passing mathematical interest, so I'm not even sure how the Global and Board rankings work. I'm not sure how the H-rating would give any indication of how tough your opponents were, since there is nothing in the calculation that uses the global or board rankings.

Where he keeps his liquor is a secret still.
Fri 2nd Jul 2010 06:41 #14 / 211

Mongrel
Where's the armor?

Rank

Brigadier General

Rank Posn

#54

Join Date

Nov 09

Location

Posts

522

I support Alpha's logistic, Hugh's holistic, 11's heuristics and ASM's linguistics.

Longest innings. Most deadly.
Fri 2nd Jul 2010 07:06 #15 / 211

Tesctassa II
Standard Member

Rank

Captain

Rank Posn

#227

Join Date

Jan 10

Location

Posts

129

Mongrel wrote: I support Alpha's logistic, Hugh's holistic, 11's heuristics and ASM's linguistics.

As soon as I'll finish my exams (my last two ever!!!! °___°) and get my internet connection back, I'll be more into this than I'm actually doing (I'm trying to follow this thread though).

Support the 'istics'!

(=
Fri 2nd Jul 2010 08:16 #16 / 211

M57
Standard Member

Rank

Brigadier General

Rank Posn

#73

Join Date

Apr 10

Location

Posts

5083

Ok, weâ€™ve seen two solutions that even the playing field in terms of â€œcappingâ€ the limits of a rating. One ranges from 0-2 and the other 0-100%, both of which I like.

But they still have the disadvantage of not being very helpful when players have not played many games. E.g., monkey joins WG, monkey somehow wins all 2 or 3 games played, and monkey retires with the best possible g-rating.

Why not measure the likelihood of a certain performance in terms of deviation from the norm?

Lucky monkey goes 3/4 playing in 2-player games, putting him in the top 12.5% of all monkeys who play four 2-player games = +1.15 expressed as a standard deviation.

Same monkey then goes 3/4 playing in 3-player games, putting him in the top 1.2% for a SD of +2.26

From here itâ€™s a matter of weighting these performances. Because monkey has played the same number of games in each category in this particular case Iâ€™m calling it the mean of these numbers for a composite SD of +1.70.

Consider that a player who goes 2/2 in a pair of 8-player games has a SD of +2.15.

It follows that these numbers could still be inflated for those who don't play many games, but I think it would be pretty hard to whip up a good rating after 10 or so games, so I would propose that a "G"-rating isn't valid (posted), until 10 games have been played.

..but we won't be happy until there is a "barren" designer feature.

Edited Fri 2nd Jul 08:42 [history]
Fri 2nd Jul 2010 08:40 #17 / 211

Mongrel
Where's the armor?

Rank

Brigadier General

Rank Posn

#54

Join Date

Nov 09

Location

Posts

522

M57 wrote:
Ok, weâ€™ve seen two solutions that even the playing field in terms of â€œcappingâ€ the limits of a rating. One ranges from 0-2 and the other 0-100%, both of which I like.

But they still have the disadvantage of not being very helpful when players have not played many games. E.g., monkey joins WG, monkey somehow wins all 2 or 3 games played, and monkey retires with the best possible g-rating.

Why not measure the likelihood of a certain performance in terms of deviation from the norm?

Lucky monkey goes 3/4 playing in 2-player games, putting him in the top 87.5% of all monkeys who play four 2-player games = +1.15 expressed as a standard deviation.

Same monkey then goes 3/4 playing in 3-player games, putting him in the top +98.8% for a SD of +2.26

From here itâ€™s a matter of weighting these performances. Because monkey has played the same number of games in each category in this particular case Iâ€™m calling it the mean of these numbers for a combined SD of +1.70.

I like (and agree) with this too, but before we get much further, I'd just like to say that every ranking system will fail some "reasonable criterion". Somebody proved this. Alpha told me about it, and was why I was OK with G-rating.

Improvements to G rating? Sure. Will that rating be "wrong" in some way? Always.

The WF (gasp!) idea was to not count percentages until a certain number of games was reached. Seems to be a decent 4th solution.

In summary: Grip it, rip it, move on. We had a seemingly similar discussion about an aggressiveness stat that ended with.... well, it ended.

Longest innings. Most deadly.
Fri 2nd Jul 2010 08:47 #18 / 211

M57
Standard Member

Rank

Brigadier General

Rank Posn

#73

Join Date

Apr 10

Location

Posts

5083

Mongrel wrote: every ranking system will fail some "reasonable criterion".

Fair enough. What reasonable criterion does the SD method not cover? We've already determined that the G-Rating can be gamed, which is a pretty strong strike against it.

..but we won't be happy until there is a "barren" designer feature.

Edited Fri 2nd Jul 08:56 [history]
Fri 2nd Jul 2010 10:44 #19 / 211

BlackDog
Standard Member

Rank

Lieutenant General

Rank Posn

#5

Join Date

Apr 10

Location

Posts

359

I think we should ignore skewing at low game counts, and focus on eliminating skewing due to different game sizes at high game counts. Nobody cares about the G rating of someone who just joined the site.

Edited Fri 2nd Jul 10:45 [history]
Fri 2nd Jul 2010 10:47 #20 / 211

Mongrel
Where's the armor?

Rank

Brigadier General

Rank Posn

#54

Join Date

Nov 09

Location

Posts

522

M57 wrote:
Fair enough.

HA! Exactly.

Not sure what example would exploit your formula over others, but it probably happens when you weight the deviations together. What the SD approach captures that the other two do not is "sustained dominance"- my guess is that the global ranking is also an attempt to do this. From Hugh's formula, one can extrapolate some crisp ancillary data about quality of opponents which I also like.

Coin flip.

Longest innings. Most deadly.

Pages: 12 3 4 5 6 7 8 9 »»» (11 in total)

Home » Forum List » General Discussion » View Thread