Abstract: In this article, we propose several novel distributed gradient-based temporal-difference algorithms for multiagent off-policy learning of linear approximation of the value function in Markov ...