Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton
MetadataVis full innførsel
Multi-armed bandit problems have been subject to a lot of research in computer science because it captures the fundamental dilemma of exploration versus exploitation in reinforcement learning. The goal of a bandit problem is to determine the optimal balance between the gain of new information (exploration) and immediate reward maximization (exploitation). Dynamic bandit problems are especially challenging because they involve changing environments. Combined with game theory, where one analyze the behavior of agents in multi-agent settings, bandit problems serves as a framework for benchmarking the applicability of learning algorithms in various situations. In this thesis, we investigate a novel approach to the multi-armed bandit problem, the Kalman Bayesian Learning Automaton, an algorithm which applies concepts from Kalman filtering, a powerful technique for probabilistic reasoning over time. To determine the effectiveness of such an approach we have conducted an empirical study of the Kalman Bayesian Learning Automaton in multi-armed dynamic bandit problems and selected games from game theory. Specifically, we evaluate the performance of the Kalman Bayesian Learning Automaton in randomly changing environments, switching environments, the Goore game, the Prisoners Dilemma and zero-sum games. The scalability and robustness of the algorithm are also examined. Indeed, we reveal that the strength of the Kalman Bayesian Learning Automatons lies in its excellent tracking abilities, and are among the top performers in all experiments. Unfortunately, it is dependent on tuning of parameters. We believe further work on the approach could solve the parameter problem, but even with the need to tune parameters we consider the Kalman Bayesian Learning Automaton a strong solution to dynamic multi-armed bandit problems and definitely has the potential to be applied in various applications and multi-agent settings.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad