IEEE Transactions on Automatic Control, Vol.53, No.2, 596-601, 2008
Converging coevolutionary algorithm for two-person zero-sum discounted Markov games with perfect information
This note proposes a novel population-based coevolutionary algorithm, called the "local-equilibrium-based coevolutionary algorithm" (LECA), for solving infinite-horizon-discounted two-person zero-sum Markov games with perfect information. LECA "randomizes" a simplified variant of Raghavan and Syed's algorithm, which adapted Howard's policy improvement algorithm for solving Markov decision processes into a "negotiation process" algorithm between two players via a lexicographical search. LECA runs over relatively small sets of policies such that the negotiation process proceeds with the sets via "policy switching," rather than with the entire policy spaces of the players. The use of policy switching eliminates the action spaces manipulation in value iteration and policy iteration-type algorithms and derives a novel concept of "local equilibrium." With the concept incorporated into the LECA, a "local" equilibrium policy pair is identified as an "elitist," and kept in the next population, directing the LECA toward finding an equilibrium or a near-equilibrium policy pair. The algorithm is especially targeted to problems where the state space is small but the action spaces of the players are extremely large, and converges with probability one to an equilibrium policy pair for a given game.