IEEE Transactions on Automatic Control, Vol.62, No.3, 1465-1470, 2017
Distributed Reinforcement Learning via Gossip
We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.