IEEE Transactions on Automatic Control, Vol.51, No.9, 1523-1526, 2006
A policy improvement method in constrained stochastic dynamic programming
This note presents a formal method of improving a given base-policy such that the performance of the resulting policy is no worse than that of the base-policy at all states in constrained stochastic dynamic programming. We consider finite horizon and discounted infinite horizon cases. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy.
Keywords:constrained Markov decision process;dynamic programming;policy improvement;policy iteration