Playing Axis & Allies Revised Using Learning Automata
MetadataVis full innførsel
The Artficial Intelligence (AI) of opponents in computer games in general, and in strategy games in particular, have been plagued with performance problems of many kinds since they first appeared. Not the least of these problems is the fact that their design schemes often base themselves on predfined ways to play the game, making these opponents predictable and dull to a seasoned player. In this thesis, we propose using Learning Automata (LA) to create opponents that are able to adapt to any game situation and find a good response, much in the way a player would - by looking ahead in time to see what could happen in the game beyond the immediate next move. As a suitable environment for these LA, we have chosen the game Axis & Allies Revised. A turn-based war game emulating the second world war, it has many layers of complexity for the LA to struggle with - multiple moves per turn, random outcome of combat, and highly complex rules. To play this game well, the artficial opponent would need not only coordinate all his units into the best combined move each turn, but also to avoid performing moves in the present that it would be punished for during the next turns. To solve these problems, we propose a two-step solution: First, each unit will be assigned its own, independent LA. Secondly, for each possible action that this unit can select in the next immediate turn, another independent LA will be assigned. This process can then be repeated until a sufficient depth into future moves has been achieved. Each tier of LA in this structure will receive its feedback not from its immediate surroundings - but from the status of the next LA down the tree. In this thesis we lay the foundation for such a solution by implementing the method on a smaller scale, and by carefully testing its performance in a controlled environment. We find which approaches give the best results, which can only perform under certain conditions, and which are suitable for expanding into larger scale. The three types of LA chosen for our testing covers most schools of reinforcement learning. The Tsetlin Automata, with its simple, state based structure. The Linear Reward Inaction Automata, with its linear updating scheme. And finally the Bayesian Learning Automata, shaping conjugate distributions in order to determine the optimal action. Each have their own unique strengths and weaknesses, which are recorded in this thesis. Through thorough testing and careful tuning of these automata, we conclude that while LA may in fact have the potential to perform well in almost any type of scenario, it would still be impractical considering the time spent on deciding on a move. While the speed of decision making of our LA vary, so does its performance, even in our small scale testing. Nevertheless, we believe that our results should give some insight into the possibilities and benefits, both in performance and design simplicity, of using LA as the decision maker for artificial players.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad