A policy improvement method in constrained stochastic dynamic programming

Chang HS

IEEE Transactions on Automatic Control, Vol.51, No.9, 1523-1526, 2006

DOI10.1109/TAC.2006.880801 Export Citation

A policy improvement method in constrained stochastic dynamic programming

This note presents a formal method of improving a given base-policy such that the performance of the resulting policy is no worse than that of the base-policy at all states in constrained stochastic dynamic programming. We consider finite horizon and discounted infinite horizon cases. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy.

Keywords:constrained Markov decision process;dynamic programming;policy improvement;policy iteration