IEEE Transactions on Automatic Control, Vol.55, No.5, 1254-1257, 2010
Evolutionary Policy Iteration Under a Sampling Regime for Stochastic Combinatorial Optimization
This article modifies the evolutionary policy selection algorithm of Chang et al. [1], [2], which was designed for use in infinite horizon Markov decision processes (MDPs) with a large action space to a discrete stochastic optimization problem, in an algorithm called Evolutionary Policy Iteration-Monte Carlo (EPI-MC). EPI-MC allows EPI to be used in a stochastic combinatorial optimization setting with a finite action space and a noisy cost (value) function by introducing a sampling schedule. Convergence of EPI-MC to the optimal action is proven and experimental results are given.
Keywords:Combinatorial optimization;evolutionary policy iteration (EPI);Monte Carlo (MC);stochastic optimization