IEEE Transactions on Automatic Control, Vol.63, No.4, 1018-1031, 2018
Cooperative Q-Learning for Rejection of Persistent Adversarial Inputs in Networked Linear Quadratic Systems
In this paper, a cooperative Q-learning approach is proposed to enable the agents in large networks to synchronize to the behavior of an unknown leader by each optimizing a distributed performance criterion that depends only on a subset of the agents in the network. The novel distributed Q-functions are parametrized as functions of the tracking error, control, and adversarial inputs in the neighborhood. In the proposed approach, the agents coordinate with the neighbors in order to pick their minimizing model-free policies in such a way to guarantee convergence to a graphical Nash equilibrium and also attenuation of maximizing worst case adversarial inputs. A structure of two-actors and a single-critic approximators is used for each agent in the network. This eventually solves the complexity issues that arise in Q-learning. The two-actors are used to approximate the optimal control input and the worst case adversarial input, whereas the critic approximator is used to approximate the optimal cost of each of the coupled optimizations. Effective tuning laws are proposed to solve the model-free cooperative game problem while also guaranteeing closed-loop stability with the use of rigorous Lyapunov-based stability proofs. Finally, a numerical example is used to illustrate the effectiveness of the proposed approach.