Balancing PokeHearth - part 3


Last time, we finished our balance changes and then let a new simulation run. This time, we can now see what impact these changes really had. To start, I'll repeat the last paragraph, then give a short overview.


Last Time

There are still more cards to buff and to nerf, but really, the idea is not to control every single aspect of the game. The interesting part is that the AI and evolution will create a metagame that is truly a thing on its own. Or at least that is the hypothesis. With these changes, I will let the same 20,000 games play out, and will return with results - perhaps some of the types that were not directly changed will take a different position nonetheless.

And exactly what I suspected did indeed end up happening. After running a new simulation of 20,000 decks, taking five hours to run, the meta changed as follows:


The strange thing is that the ranked percentages are mostly the same - top four making up 50% - but only Psychic, a type not directly changed, is in the top four both now and before. Chances are that even with completely non-biased cards, the chaotic nature of inmeta competition will produce these results.

Big changes are also seen in Dragon and Normal, neither of which was changed. Normal might have been buffed by the changes to Dark (since Normal and Dark are linked), but the same cannot be said of Dragon, which is linked with Ice.

Some things remain the same. Electric is almost outside of the meta, just like before, and Dark still has no decks in the top 180 - despite both being buffed. Maybe my changes to Steel were objectively bigger, though I doubt that. Rather, the impact of changes is fundamentally unpredictable because of the many interacting parts.


Overview in Short

Meta before:
  • Top four: Grass, Water, Fire, Psychic
  • 14 cards at 55% winrate or higher
  • 62 cards at 45% winrate or lower
Changes:
  • Nerfed seven fire-type cards
  • Nerfed five water-type cards
  • Nerfed eleven grass-type cards
  • Redesigned four dark-type cards, fixed an issue relating to one card, buffing three other
  • Redesigned four electric-type cards, buffing four other
  • Buffing seven steel-type cards
  • Fixed one Normal-type card
Meta afterwards:
  • Top four: Steel, Psychic, Normal, Dragon
  • 2 cards at 55% winrate or higher (11 at 54% winrate or higher)
  • 60 cards at 45% winrate or lower
There can be no question that the top decks have changed, and thus that the meta has changed. This can also be seen by unchanged types suddenly becoming popular.

Some overpowered typings, which led to many 55%+ winrate cards, have also been fixed. However, there are still many unusably weak cards.


Card and Type diversity

I finally found a way to represent all the needed information in one single image:


Here, pokémon are sorted horizontally by type and vertically by winrate, where the black line is 50% winrate. What is shown above is actually how the meta was before the changes. Here we can see what we were talking about before, like how almost all fire-type cards are above 50% winrate. The other types we nerfed, Grass and Water have wider distributions (or at least more pokémon), but generally lie above the black line.

Now, let's compare it to the meta after the changes:


Maybe the problems have not as much been solved as simply shuffled around, but still, I think the distributions are overall a bit better now. The biggest problems seem to be with Steel (too strong) and with Fairy and Flying (too weak). But you might also notice that even Steel-type pokémon that were not buffed, such as the Aron and Bronzor families, still reaped benefits through association.

But this is just the pure winrate. What if...


Random versus Expert winrate

In the beginning of the simulation, the 20,000 decks are completely random and unvetted. A lot of these decks are terrible and die out within the first few generations. Later on, the decks are at least somewhat competent.

While the first five generations are dubbed "random", the rest are dubbed "expert". Here we see the above illustrations, first showing the winrate in random decks, then at the end in expert decks:


Now that's really cool and all, and almost all pokémon really have widely different winrates in the beginning and end. But it might be a bit too information-dense to really get anything out of. Instead, I'll do some more conventional statistical imagery:


These yellow/green boxes are minimalistic boxplots, showcasing both the mean (where the yellow and green meets), as well as the upper and lower quartiles at each end. Each type shows both the expert and the random winrates, so that the change is visible.

And still, this is too much information.


Now that's more like it. Just a single illustration of eightteen values. This shows the gain in winrate as decks move from being random to only being expert decks. Or we should rather say, loss, since most types actually lose winrate.

While some types show very little difference between the two, Fairy and Flying, two of the weakest types, have higher winrate cards after the random, terrible decks, have been discarded. This might mean that it is easier to make a bad deck with these types, or rather, that it is harder to make a good one. Overall, though, it is not impossible that this is a side-effect of their generally low win-rates.

On the other hand, Poison, Dark, Ground, Ghost, Fighting and Normal have significantly higher random winrates than expert, meaning that their random decks are comparatively better. This might mean that there are fewer bad deck compositions to avoid. Perhaps it can be said that they have more straightforward synergies, or more robust gameplans? Maybe? Because interpretation of data, no matter how succintly analysed, will always be a bit of a guessing game.

Maybe this will just become an article about how difficult it really is to analyze numbers. Or perhaps, the expert vs random comparison is just too complex for feeble human minds.



Black and White

An easier thing to understand must be the difference between the player going first and the player going second. Overall, the difference in winrate is currently 54% for the player going first (white) and 46% for the player going second (black). Don't worry. It does not matter much in the evolution of decks, since each round sees each deck playing two games, one going first and the other going second.

However, the difference in winrate between black and white (BW difference) is not always 8%. If we remember the chance of statistical anomalies, the cards with the highest BW difference have up to 17%, with win-more cards like Throh, Simipour, Azelf and Manaphy. Notably, while they are win-more cards, they are not particularly aggressive. There are also a couple of high attack, low-health contenders like Rufflet and Hitmonlee.
 

No cards have higher black winrate than white, but several have just a couple percent BW difference, notably several Dragon type card, like these:


I think the previous graph was a bit more complicated than necessary. Instead, I'll just list the difference in average of each type:



Here we see that Fighting has the highest BW difference of 12% (56% vs 44%), while Dragon has the lowest difference at 5% (52,5% vs 48,5%). Most of the differences make sense, though they also tell us something about the different types. For instance, Fighting is an aggressive deck archetype that wins by building an insurmountably strong board, mixed with Fire-type burn cards. Steel is interesting, since I never designed for it to be aggressive, but I must admit it has several strong, win-more cards. On the opposite side of the spectrum, Rock, Ice and Dragon were all designed with control-strategies in mind.

As things are, I actually am kind of happy with the card-per-card difference. It makes sense to have some cards better for one sort of match-up rather than another. I am happy to see that no hyper-aggressive card exists with a much higher white than black winrate. The rest of the differences seem quite healthy for the meta. As are the type differences. Though it is a bit worrisome that mostly the succesful types have high BW differences...

However, it still remains a problem that there even is that big of a general WB difference. I would be much happier if the span of BW difference went from +10% to -5% instead. This probably is not related to the card designs, though, but more to the basic rules of the game.


Early Game and Late Game Winrate

I was pleasantly surprised to see that BW differences were not directly related to aggressiveness of cards. This measure, whether the cards have higher winrates early or late in the game, really should be, though. Cards which have a high score on this index will be highly aggressive, win early and lose late, while those with a low score will become stronger the longer the game goes on.

And here we see that...



Er...

That I am wrong. My favourite enemy, Fire, does not actually win games significantly early. It actually has a higher winrate as the game goes on, albeit slightly, possibly because you can just as easily burn down the opponent over a longer amount of time, taking a chunk of their health off turn by turn.

Right next to Fire is Ground, which ought to be a control-oriented type. However, we do find its counterpart, Rock, closer to where we would expect to find it. Here it is important to note that this is not a tally of the characteristics of the deck, but rather the average of the cards of that type.

Looking at specific cards, we see that the best early-game cards, with almost 20% higher winrate during the first 8 turns, are cheap, but not just that; their effect specifically requires one to still have several cards in hand:
 Oops, these images still show the cards before the balance changes...

Just the opposite, the best late-game cards have almost 20% higher win-rate after the first eight turns, and are majorily made up of non-Pokémon cards with effects that do not directly influence the tide of the game:


And the ultimate late-game Pokémon are:
 
Looking at their effects, it kind of makes sense. These are truly late-game effects.



Where this is taking us

The more I look at the statistics, the more I feel like I cannot really use them for anything. On the other hand, looking at what cards act as statistical outliers gives me a wealth of information, and most important of all, the feeling that the design works as intended.

With that in mind, for one last hooray, let's go back to Expert vs Random win-rate, which started our confusion, and see which cards are best in Expert decks:

And now I can finally feel certain that this actually is an index of the most synergistic or situationally-useful cards. Obviously with Ninjask and Escavalier, you want a certain type of cards in your hand to really get use out of their effect. Wigglytuff might seem a bit strange until you remember that Sleep is a status that is much more useful in controlly than tempo decks.

On the other hand, the other end of the spectrum might be the blunt tools, the all-round good cards that do not care about what situation they are played in:




That seems about right. I can't really find any way that any of these are highly synergistic. It should be mentioned that it is Swalot, not Gulpin, which has the high random-winrate index. I do not think it is random that two of these cards are ones that summon a third card, since these types of cards usually plop a huge amount of stats down on the board without requiring a special situation.

And I think that's about it. I wonder if I'll do a part four. Perhaps it will be fun to just change the game rules to make the going first versus going second difference smaller, then see how it affects literally every single card and deck archetype. Yeah, I'll do that eventually.

Comments