IEEE Transactions on Automatic Control, Vol.62, No.9, 4318-4332, 2017
Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features
We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.
Keywords:Bias optimality;direct-comparison-based optimization;Ito-Tanaka formula;local time;relative optimization;semismooth functions;state comparability;theory;viscosity solution