Flexible AI System

I've been a bit distracted lately by a project that has little to do with procedural generation. I think we can still have a bit of a conversation about the AI system that I am using both for this project and another similar project which shall not be named. Let's say it procedurally generates solutions to problems, not with exact calculation, but rather with chance and randomness.

The project looks like this:


What is it with me and pokémon?

This is a project marrying a collectible card game, a strategy game and pokémon, such as one is want to do. However, the current version of the project and the one first envisioned differ quite a bit so better to fit the chosen AI system.

So what is this AI?


Completeness

The AI has the goal to find an optimal list of moves for any given pokémon to perform during its turn. This pokémon will have a hand of cards, each of which is associated with a cost. Just as well, the pokémon has a budget of energy. Seel above had four points of energy, and then spent all of them on being able to use Aurora beam. Alternatively, it could have combined several other, cheaper cards from its hand together to also use the four energy.

One way to create an AI to solve this problem would be to make a list of all possible combinations within this four-energy budget and this seven-card hand. If the order of moves were arbitrary, it would even be calculable. But it is not. Moving and then using a mellee attack is not the same as using a mellee attack and then moving.



There are two cards in the game which cost zero, but award the user with energy. This, then, would invalidate relying on the four-energy budget in all cases. Just as well, if the pokémon is poisoned, the more energy they use, the more health they lose, so perhaps the best solution does not use all the energy possible.

And we have not even started broaching the subject that each move needs a target - what direction will you move in, what enemy do you choose to attack, etc.

I am sure that the problem posed by the game is, technically speaking, complete and could be solved with a large enough list of combinations and a complex enough decision tree. To build the tree, you would have to program in several use cases, like with the cards that give new energy, or those that draw new cards that could potentially be used. I don't want to do that. I want to just design the cards, and let the AI figure out what to do with them itself. Flexibility is a much more important goal than completeness.



Random Choice and Fitness

Instead of trying to find all possible combinations, we can just try out a selection of random combinations, and with a large enough number, this might be almost as good. Sure, several of the combinations might be repeats, but whatever. The program calculates a thousand paths in less than a second, and in most cases, a thousand is more than enough to find a solution as good as a random human would. Sure, we might miss out on some very specific strategies, but most of the obvious possibilities will be found out.


Then all there is left is to evaluate each of these thousand possibilities, give them a value, and then pick the best one. This, then, is a fitness-based solution. Here, the fitness of any given list of actions is evaluated globally - that is, not just as the effect of the actions themselves, but as an evaluation of the whole battlefield. This, again, allows the greatest flexibility and the least single-case AI coding.

What is the evaluation? Well, it is my attempt to translate some general principles into code. First, you'd like to have as few enemies alive as possible, and those alive should have low health, low stats and few cards. Basically the opposite for friends. Also, you want to be close enough to the enemies to use your cards against them next turn. Finally, you want the enemies not to be too clumped together, for your friends to not be too vulnerably close to enemies, and then your friends to stick together.

But I have slightly circumvented the whole problem of how the different possible actions are described. The picture above shows several lists of actions with the same name but different values, and it seems to repeat the same moves over and over. Why is this? Let's talk about IO.


Input and Output

Central to the algorithm is that it does not know what moves are good to use, and thus, it follows easily that neither does it actually need to know what moves it uses at all. It does not even need to know the rules of the game. Those listings above are just the moves that actually went through, not the ones attempted. The output might be a list of moves. But the input is just a list of random numbers between zero and one.

These numbers are ordered into sets of two. The first number of each set decides which card in the hand that the AI will try to use. Zero would pick the left-most card, one the right-most, one half the middle one, and so on. This means that the selection adapts to newly drawn cards. However, often the card will not even be able to be played. The game rules will just stop the illegal move to be made, and then the simulation carries on.

The second number of each set decides the target for the action, which works similar to described above. However, the function depends on the card. A "motion" card will highlight all tiles within X distance from the pokémon. A "mellee attack" card will highlight all tiles within X distance that are adjacent to an enemy. A "ranged attack" card will highlight the tiles on which enemies stand that are within X distance. No matter what, all the highlighted tiles will be entered into a list, and then the number between zero and one decides which tile from the list will be picked as the target.


This nonspecificity of the input to the AI means not only that the AI does not need to know the rules of the game, but also that the player and the AI can both use the same script. The snippet of code above is the only thing that needed to be added to the rules to let the AI play like the player. Argument2 here is the second number between zero and one - if it is, as described a number between one and zero, it is multiplied with c, the number of possible tiles, to pick a possible tile - and if argument2 instead is -1, a function is triggered to have the possible tiles flash and wait for the user to click one. That's it. It really is this simple.


Simulation

Well, if the AI does not know the game rules, how will it be able to evaluate the effect of the moves? Effectuating the moves is simple, as described, it just uses the same scripts as those who handle the user input. But that's no good. We only need to simulate them after all, so that the AI does not get to play extra turns just to find out what the best plan is.

For this purpose, two small scripts are used. The first one, which is called before the AI starts, is called rememberthis(), which remembers the game board situation. Where everything is, its health, stats, cards, decks, etc. Then there's a second script, called fuckgoback() which returns everything to how it was remembered, which is then called after each of the lists of numbers between zero and one have been evaluated.

Let's pretend there's not weird bugs where sometimes, pokémon revive at strange times but without a sprite, and just say that at least theoretically, this solution is bulletproof.


Pros and cons

So I love this method. The reason I am writing about it is that this is the, er, third time I've implemented it - first for a Hearthstone simulation, then for PokeHearth, and now for this. But this also means that I have had plenty of time to think about what its strengths are and its weaknesses. Well, everything until now is a sales pitch, so I'll just quickly sum up the strengths.
  1. Flexible - works with all cards, even when creating new, outlandish effects
  2. All-encompassing - implement once, no need to update
  3. Light - does not take a lot of code
And then we'll go a bit more in depth with the problems:
  1. It's not perfect. The AI has a skill level similar to a new player and often ignores good move combinations that veterans would find. One could increase the amount of attempts it makes, but there are diminishing returns, both in general and because this algorithm tends to repeat itself.
  2. Sometimes, very strange results appear. This might be related to the fitness function choosing unconventional solutions, but maybe it's actually just that all the attempts managed not to find any good solutions despite them being quite obvious.
  3. Lack of transparency. When it goes wrong, you can't see why.
  4. Volatility of the fitness function. My general problem is between the AI wasting cards willy nilly, and the AI not wanting to use any cards at all ever.
  5. Difficulty in adding specific rules on top of the general AI. Say that you want to make sure no area of effect cards are used when there is only a single target within range - you can do this, I am sure, but it is not quite obvious how this should be added apart from putting in a whole bunch of if-then-statements into the clean, pristine code.
  6. Deals with randomness terribly. With few enough iterations, there is a natural protection in that rare possibilities also have a low chance of being tested but the AI. Nonetheless, given enough iterations, the AI will always assume the best possible result whenever there is a random chance. Not only this, but if, for instance, an enemy may or may not die after using the first card, all following cards' targets might not line up with what was expected.
  7. Inability to plan ahead. The fitness function, unless done with incredible sophistication, will try to get as much done as possible this turn, without coordinating with other friends or with future turns. This works quite alright in a tempo-focused game like this or Pokehearth. But it does put a barrier to certain game types.
Welp, that sure is a lot of problems. Nonetheless, this is the AI system that allowed me to add over 800 cards to Pokehearth with 200 unique effects, with very little time spent on the AI apart from a few tweaks to the fitness function. Just as well, this project showcased here, almost close to beta release, was created in just a day and a weekend, already with 120 cards. To me, at least, the benefits outweigh the cons. Nonetheless, it did make me decide on a less random gameplay style, since I knew this would gel better with the AI.

Comments