IEEE Transactions on Automatic Control, Vol.59, No.4, 921-936, 2014
Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle
We propose a partial-information state based approach to the optimization of the long-run average performance in a partially observable Markov decision process (POMDP). In this approach, the information history is summarized (at least partially) by a (or a few) statistic(s), not necessary sufficient, called a partial-information state, and actions depend on the partial-information state, rather than system states. We first propose the "single-policy based comparison principle," under which we derive an HJB-type of optimality equation and policy iteration for the optimal policy in the partial-information-state based policy space. We then introduce the Q-sufficient statistics and show that if the partial-information state is Q-sufficient, then the optimal policy in the partial-information state based policy space is optimal in the space of all feasible information state based policies. We show that with some further conditions the well-known separation principle holds. The results are obtained by applying the direct comparison based approach initially developed for discrete event dynamic systems.
Keywords:Direct comparison-based approach;finite state controller;HJB equation;performance potential;policy iteration;Q-factor;Q-sufficient statistics