The bayesian pursuit algorithm: A new family of estimator learning automata
Chapter, Peer reviewed
MetadataVis full innførsel
OriginalversjonZhang, X., Granmo, O.-C., & Oommen, B. J. (2011). The bayesian pursuit algorithm: A new family of estimator learning automata. In K. Mehrotra, C. Mohan, J. Oh, P. Varshney & M. Ali (Eds.), Modern Approaches in Applied Intelligence (Vol. 6704, pp. 522-531): Springer.
The fastest Learning Automata (LA) algorithms currently available come from the family of estimator algorithms. The Pursuit algorithm (PST), a pioneering scheme in the estimator family, obtains its superior learning speed by using Maximum Likelihood (ML) estimates to pursue the action currently perceived as being optimal. Recently, a Bayesian LA (BLA) was introduced, and empirical results that demonstrated its advantages over established top performers, including the PST scheme, were reported. The BLA scheme is inherently Bayesian in nature, but it succeeds in avoiding the computational intractability by merely relying on updating the hyper-parameters of sibling conjugate priors, and on random sampling from the resulting posteriors. In this paper, we integrate the foundational learning principles motivating the design of the BLA, with the principles of the PST. By doing this, we have succeeded in obtaining a completely novel, and rather pioneering, approach to solving LA-like problems, namely, by designing the Bayesian Pursuit algorithm (BPST). As in the BLA, the estimates are truly Bayesian (as opposed to ML) in nature. However, the action selection probability vector of the PST is used for its exploration purposes. Also, unlike the ML estimate, which is usually a single value, the use of a posterior distribution permits us to choose any one of a spectrum of values in the posterior, as the appropriate estimate. Thus, in this paper, we have chosen a 95% percentile value of the posterior (instead of the mean) to pursue the most promising actions. Further, as advocated in , the pursuit has been done using both the Linear Reward-Penalty and Reward-Inaction philosophies, leading to the corresponding BPST RP and BPST RI schemes respectively. It turns out that the BPST is superior to the PST, with the BPST RI being even more robust than the BPST RP . Moreover, by controlling the learning speed of the BPST, the BPST schemes perform either better or comparable to the BLA. We thus believe that the BPST constitutes a new avenue of research, in which the performance benefits of the PST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.
Published version of a chapter in the book: Modern Approaches in Applied Intelligence. Also available from the publisher at http://dx.doi.org/10.1007/978-3-642-21827-9_53