화학공학소재연구정보센터
Automatica, Vol.103, 435-442, 2019
Data-driven approximate Q-learning stabilization with optimality error bound analysis
The approximate Q-learning (AQL), as a typical reinforcement learning method, has attracted extensive attention in the past few years because of its outstanding ability to solve the nonlinear optimal control problem when the knowledge/model of the plant is unavailable. However, because of function approximation errors, the AQL algorithms can just give a near-optimal solution. Hence, a quantitative analysis result of the optimality error bound has important significance. In this paper, the off-line value iteration AQL is used to solve the model-free optimal stabilization control problem and a new optimality error bound analysis framework is proposed. Firstly, for convenience and clearness of analyzing the optimality error bound, the Q-learning operator is well defined based on the estimate of the domain of attraction (DOA) for closed-loops. Secondly, a quantitative analysis result of the estimation error bound for the optimal Q-function is obtained by selecting the function estimator as Gaussian processes regression. Finally, a quantitative analysis result of the optimality error bound, which is the error bound between the optimal cost and the actual cost of the AQL closed-loop, is given. As shown in the main result of this paper, the optimality error bound is determined by the approximation error bound of the function estimator (due to the finite number of data points) and the difference between the two Q functions obtained in the last two iterations (due to the finite number of iterations). (C) 2019 Elsevier Ltd. All rights reserved.