Monday, January 8, 2007

Guild Wars: The New Ladder System

A while back, ArenaNet announced that they'd be changing the way guild rankings work in the future. They'd be de-emphasizing the ladder and emphasizing new daily tournaments as the way teams would be qualifying for any championship tournaments. We've yet to see exactly how the daily tournaments work (Late January was said to be when they're revealed. And I'm hearing that there's going to be some sort of trial shake-up of the PvP scene somewhere around the 19th. No clue about what that actually means, though.) so it's hard to say exactly what effect they're going to have. But the first step of this plan – gutting ladder play – has been implemented.

Doing so was a two fold process. First, the ladder was reset for “one last time”. And then it was restarted with a bit of a change to the ELO system. I haven't seen it because, you know, I haven't looked but it seems as though the k value is now something far less than 30 which was the old amount. It's down to 5 if I had to guess. Which means teams both gain and lose less ranking based on freely playing in random ladder matches.

As for what k value actually means, well, you need to know how the ELO system works. It's named after the inventor, Arpad Elo, and was developed primarily as a way of measuring chess players. It works by figuring out the chance that a given player has to win against another given player. To do so all players are given a rank. When you start out you're given a number of points. In Guild Wars this is 1000 which is the rating you'll start at the first time your guild plays a battle (Or, you know, the ladder gets reset or the servers catch on fire or something.). When you lose it costs you rating and when you win you gain some. And players (In Guild Wars “players” means “guilds” but, eh, same difference) are ranked according to the comparison of their ratings. Players with higher ratings have higher ranks. The player with the most rating is #1, the player with the second most #2, and so on all the way down to the player with the least. To increase your rank you have to climb over the players above you so this sort of thing is said to form a ladder with each position being a rung.

Now, everybody starts out with 1000 points but that quickly changes as they play. And to figure out the chances that a given player will win the system uses that rating along with some complicated math to calculate the percentage that a player will win. The formula looks like this:

Where A is the player, B the opponent. And E stands for the percentage that they'll win. While R is the rating for each player. You can flip it around for the opponent and you'll have it where EA + EB = 1. Or 100%. And each projected result is only a fraction of that whole. And if you can follow the math (Which, you know, I barely can) you'll see that players with higher ratings will have a much higher chance of winning than players with lower ones. The further apart those Rs are, well, the greater the chance that the supposedly better player will be the one to emerge victorious. Which is, of course, what you'd logically expect.


Then, of course, you actually play the game and afterwards you adjust each players rating. To do so, you compare the points they get for playing to their expected winning percentage. Then you multiply that times some arbitrary figure which controls how far teams can raise or fall on the ladder due to the results of a single game (This protects better players from plummeting. As well as lucky teams from getting too far out of their depth. The higher that arbitrary figure is, the more swing each match will result in. The lower, the harder it is to move up or down.). That formula looks like this:

S is the points that a player earns during the game. In Guild Wars, that's 1 for a win and 0 for a loss. K is the arbitrary figure.


So, you take the points and subtract the expected winning percentage or E from it. In Guild Wars, if you've won that number is positive. If you lose, it's negative. You multiply that by K. And then add it to the previous rating to get the new one.


Which is a fancy way of saying if you win you get more points and if you lose some get taken away. And the K value is the most points you can risk at any given time.


So, if you're expected to win close to 100% of the time then you gain the full value or close enough to it for rounding to kick in. And if you're expected to lose 100% of the time, you actually won't lose any ranking. Because in a match like that no rating is actually at stake. In Guild Wars, the winning team gets K*(1-1) or 0. And the losing team gets K*(0-0) or, you guessed it, 0. But, practically speaking, there's no situation where one side or the other is guaranteed a victory. So there are very few situations where teams will gain or lose the full K value. That only happens when there's a big upset. Which, again, logically shows that the winning team was under ranked heading into the match. And the loser overrated.


As an example, there are two teams and they each haven't played a match. So, they both have a rating of 1000. And, according to the ELO system, they both have a predicted winning percentage of 50%. It's a toss-up, in other words, as far as the ratings are concerned and the battle could go either way. Which is, after all, what happens any time you get to teams with the same ranking. So, with a K value of 5 (Which is what I think the new system is using) each team is risking only 50% of that. Or 2.5 points. Rounding seems to work in the loser's favor so the winning team will get 3 points and the loser drop 2. So, after fighting once Team A will be at 1003 rating and Team B will be at 998.


Not very much compared to the 15 both would have previously shifted. And it'll now take a lot more matches to separate the two and change that winning percentage much beyond 51~49%. Performing the math is a bit beyond me but as far as I can tell that means that if those two teams kept playing each other (Which, by the way, Guild Wars system is designed to prevent. Once you play a team there's a bit of a grace period where you either won't face them again or won't risk any rating playing them again.) it would take them roughly five or six matches to get to the ratings they would have had under the old K value of 30 (In chess, this started out as a k value of 10. But these days it's 16 for "master" level players and 32 for everyone else). Which, of course, makes sense.


Now, there are a lot of problems with an ELO system especially when you take it away from a place like chess where the field and the pieces and rules are the same for both players – in Guild Wars you have teams playing different builds on different maps which can skew things. You've probably noticed one with my example. Although both teams start at 1000 that's not really an accurate measure of their skill level. If Team A is beating up on Team B consistently they're the better team. Under the old system, they'd quickly gain in rank until they found their “true ranking” or the point on the ladder where they should be. But under this new one it's going to take them longer and that means better teams are going to be pitted against lesser teams more often – because the rankings won't be far enough apart to really distinguish between who's good and who's, well, not.


That's because, as far as I can tell, the automated matching system in Guild Wars is based on that first ELO formula, the one that calculates the projected winning percentage. It tries to pit teams against teams near their ratings first – or what should be competitive matches. But it doesn't look like it's been updated to reflect the lowered K value and be a lot more picky about getting closer rankings now. The winning percentage is based on the rating so when the rating's shift more slowly, so too, do smaller differences in winning percentages become more important. Formerly there wasn't a big difference in facing teams with a 51% chance to win against you or a 52% chance. Now, there is because it's five or six times harder to raise that winning rate by a percentage. So teams entering into the ladder will be facing much stiffer competition especially now when teams are all around the same level. And, let me tell you, being steamrolled by a much better team trying to climb up the ladder isn't much fun.


It'll work out as teams play more matches and get to their true levels but, at the moment, it's got to be incredibly discouraging for teams looking to try out the new, supposedly less important ladder.


The other big problem with an ELO system is inflation. Over time the rating it takes to get the #1 rank will increase. It takes time, of course, because at the top of the ladder you don't get many points for winning. Especially if no one is close to your rating (And losing will cost you a lot of points, perversely making better teams less likely to play or, at least, to want to. That'd be another problem, by the way.). But say a team at the top plays 100 games a month and gets 1 point for each. Every month the ladder remains constant reaching the top rank takes 100 more rating than it used to. Which, again, makes it very hard for teams entering into the ladder to get up to speed. There's a cap on the number of points they get for winning, after all, through the K value, so it takes them even longer to reach their true rating and for the ladder to normalize. If, of course, they're better than the basement floor of 1000. And since the teams at the top have less incentive to play them because their rating will take a bigger hit, it becomes progressively harder for new teams to establish themselves.


Which is why most ladders will reset periodically. Or at least have some way for ratings to degenerate. Either when you don't play you start to lose rating. Or every day some points are siphoned out of the system. Because the longer things go the more it will ossify.


To understand this is a problem, consider the following. Let's say there's a team that's amazingly good. They get into GvG and play a lot of matches. They win each and every one. They rocket up the ladder and get the top spot. They keep winning for a while before getting bored. There's no challenge, there's no competition, they've, in their opinion, mastered the game. So they quit. They stop playing altogether.


In a ladder that doesn't degenerate they have that top spot until someone gets the rating to pass them. If they've managed to get, say, 1k more rating than everyone else who are more or less equal it's going to take a long time for those evenly matched teams to get the points they need for someone new to take the #1 spot. Months, perhaps, if they're doing it a few points a win at a time. That whole time the ladder say the “best” team is one that's no longer playing. And should that guild come back to the game, say, in a month or a year or a decade when things have changed considerably, they'll have all that rating telling people they're good when they actually might not be.


If they play enough games their rating will normalize, sure, but things like that mean that the ladder no longer becomes useful as an accurate snapshot of which teams are the best at any given moment. It becomes a historical record of teams that manage to pile up ranking however they can. With no indication of how long or how hard it was for them to do so.


Not to pick on anyone, but take the current #1 team on the Guild Wars ladder, There is A Cow Level. Since the ladder's been reset they've been on a tear playing an incredible 138 games in the span of about a week. That's roughly 20 matches a day when GvG games should last around twenty minutes or so, I'd guess, on average. Their secret? They don't play nearly that long for the most part. Most of their matches last five minutes – they're all over obs mode so it's pretty easy to see. I'm not saying they're a bad team or anything just that thanks to the currently flawed matching system they're not facing anything like decent competition. They beat up on teams that are much worse. And they get a decent amount of rating – relative to the maximum of 5, potentially – for doing so. Currently, they have 1209 points or 50 more than the #2 team. So, how can you tell whether they're actually a good team or not? Is this in fact another example of a team playing a lot of games to pad their ranking? Or is it just that [cow] is that much better than everyone else playing because, I'd imagine, a lot of really top-flight guilds are ignoring the ladder completely and practicing in scrims and unrated matches for when the daily tournaments start?


The answer lies in points per game. That's basically taking the rating and divide it by the total number of games played – the answer to that little equation is how many points the team gains, on average, whenever they fight. It's like projected winning percentages and even actually winning percentages in that it gives you an indication of how likely a team is to win in a given battle. And unlike those two it takes into account just how tough it was for a team to get to their current rating. If the PPG is high that's because a team's got a lot of rating but hasn't played a lot of games. If it's small then it's because a team's doesn't get a lot of rating to show for each win. And that means they're approaching their true ranking – or where they'd be on the ladder if everyone played an infinite number of games. Because as they get around where they should be relative to everyone else they don't lose much for losing and don't win much for winning. They beat the teams they should and lose to the teams they should and under an ELO ranking system that means their ranking won't change very much – just inflate along with the rest of the ladder, really.


From what I can tell, [cow] actually has a pretty decent PPG. Personally, I throw out the base rating of 1000 and just go by what teams have earned or lost since they started playing – so [cow]'s gotten 209 points in 138 games or about 1.5 points a game. Roughly comparable with other teams in the top ten. They've played the most games so they've climbed the highest. PPG gets a little sketchy when you get towards the top of the ladder because the talent levels can vary so much and finding decent matches to normalize scores can be difficult.


So, would I take [cow] in a heads-up fight against the #2 team Clan Detained or even the much lower [PnH] or [RenO] – teams that I actually recognize from past seasons – given that they all have a higher PPG and just haven't played as much? I'm not sure. The lower K value skews things and throws off my perceptions. We're talking about the difference between 1.5 points a game and maybe 1.75, after all. And I'm not sure if that's significant or not.


And, in any event, when daily tournaments go in it likely won't matter at all. But what does matter now is that [cow] thanks to their many games is inflating the ladder. Simply because they're playing more than everyone else. That's something the change was supposed to prevent if not correct. And, unfortunately, since the ladder's never going to be reset again the imbalances now are going to remain with us for a long time. I'm rather puzzled, then, why they didn't wait to reset the ladder “one last time” when tournaments started. Or, you know, the people in charge of things would avoid saying they'll never touch the ladder again because that's the kind of rhetoric that comes back to haunt people. I mean, I know that by just leaving the ladder to run they'll be saving the time and effort it takes to maintain it. But, you know, that's not a good thing if they want it to have any integrity whatsoever.

No comments: