SIAM Journal on Control and Optimization, Vol.57, No.5, 3118-3136, 2019
CONSTRAINED MARKOV DECISION PROCESSES WITH EXPECTED TOTAL REWARD CRITERIA
In this paper, we investigate a Markov decision process with constraints on a Borel state space with the expected total reward criterion. Assuming that the transition probability has a density function, which is continuous in an action variable, we prove the existence of an optimal randomized stationary policy. Moreover, we show that there exists a deterministic stationary policy if the Borel algebra on the state space has no conditional atoms in the subalgebra generated by the density functions and the action correspondence.
Keywords:Markov decision process;constrained optimization;total expected reward criterion;stationary optimal policy