Empirical Evaluation of the Bayesian Learning Automaton Family
MetadataShow full item record
The two-armed bandit problem is a classical optimization problem where a player sequentially selects and pulls one of two arms attached to a gambling machine, and each arm pull results in either a reward or penalty to the player. Each arm is associated with a certain reward probability which is unknown to the player, and the player needs to sequentially select and play an arm and receive a reward or a penalty in order to discover its true reward probability. The overall goal for the player is reward maximization, and the player needs to balance between exploiting existing knowledge or obtaining new knowledge by trying different arms. In the long run it may be beneficial to risk short term loss to gain greater certainty about the reward probability associated with each arm. The problem as described above is the simplest case of the more general k-armed bandit problem and is concerned with the exploration vs. exploitation dilemma. A wide range of schemes has been proposed to solve the problem, and in this thesis we focus our attention on a series of schemes collectively referred to as the Bayesian Learning Automaton family, originally introduced by O-C. Granmo with the Bayesian Learning Automaton deigned for Bernoulli distributed reward. The Bayesian Learning Automata rely upon Bayesian statistics and the use of conjugate prior distributions with sets of updating rules of hyperparameters associated with these conjugate prior distributions. These updating rules make it possible to apply Bayesian statistics in an automaton in a simple and highly computationally efficient manner. In this thesis we extend the Bayesian Learning Automaton family with three new members designed for Poisson and normally distributed reward. We empirically evaluate both the Bayesian Learning Automaton family and competing schemes in the general k-armed bandit problem with Bernoulli, Poisson and normally distributed reward. We also empirically evaluate these players in the Goore game, iterative prisoners’ dilemma and two-player zero-sum game from game theory. Through extensive experiments we show that the Bayesian Learning Automaton family overall outperforms all other comparable learning schemes in the k-armed bandit problem, and the Bayesian Learning Automata are among the top performers in all the the games they are introduced to in this thesis. Thus, we believe that the Bayesian Learning Automaton family is an important addition to the field of bandit playing algorithms and is an important area for further research.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad