Multi-Agent Reinforcement Learning:
Deep Relocating Options:
Covering option discovery has been developed to
improve the exploration of reinforcement learning in single-agent
scenarios with sparse reward signals, through connecting the
most distant states in the embedding space provided by the
Fiedler vector of the state transition graph. However, these option
discovery methods cannot be directly extended to multi-agent
scenarios, since the joint state space grows exponentially with
the number of agents in the system. In order to alleviate this problem, we design efficient approaches to make multi-agent deep covering options scalable.
- Jiayu Chen, Vaneet Aggarwal, and Tian Lan, "ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process," in Proc. Neurips, Dec 2023.
- Jiayu Chen, Jingdi Chen, Tian Lan, and Vaneet Aggarwal, "Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs," in Proc. Neurips, Dec 2022.
- Jiayu Chen, Jingdi Chen, Tian Lan, and Vaneet Aggarwal, "Learning Multiagent Options for Tabular Reinforcement Learning using Factor Graphs," IEEE Transactions on Artificial Intelligence, vol. 4, no. 5, pp. 1141-1153, Oct. 2023.
- Jiayu Chen, Jingdi Chen, Tian Lan, and Vaneet Aggarwal, "Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs," in Proc. AAMAS, May 2022.
- Jiayu Chen, Marina Haliem, Tian Lan, and Vaneet Aggarwal, "Multi-agent Deep Covering Option Discovery," in Proc. ICML Reinforcement Learning for Real Life Workshop, Jul 2021.
Federated/Parallel Reinforcement Learning:
We consider M parallel reinforcement learning agents, and wish to achieve the same sample complexity/regret as if they are fully collaborative. We show this to be the case with only logarithmic number of rounds of communication for model-based reinforcement learning. For the special case of bandits, we show that not only is the amount of communication rounds logarithmic in the time, the agents only need to share the best arm index thus achieving better privacy among agents. For the model-free setups, we consider policy gradient based approach and show that the the communication complexity for natural policy gradient can be decreased significantly using ADMM based methods.
- Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, and Vaneet Aggarwal, "Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates," in Proc. Neurips, Dec 2023.
- Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli, "Multi-Agent Multi-Armed Bandits with Limited Communication," Journal of Machine Learning Research, Jul 2022.
- Mridul Agarwal, Bhargav Ganguly, and Vaneet Aggarwal, "Communication Efficient Parallel Reinforcement Learning," in Proc. UAI, Jul 2021.
Approximation of MARL by Mean Field Control:
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of
cooperative multi-agent reinforcement learning (MARL) problems. We consider that multiple heterogeneous agents that can be segregated into K classes, and find the approximation gap between the MARL and MFC problems as the number of agents increase. Further, we design a Natural Policy Gradient (NPG) based algorithm that achieves approximate optimality for heterogenous MARL.
- Washim Uddin Mondal, Vaneet Aggarwal, and Satish V. Ukkusuri, "Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)," Sept 2022
- Washim Uddin Mondal, Vaneet Aggarwal, and Satish V. Ukkusuri, "Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State," Transactions on Machine Learning Research, May 2023.
- Washim Uddin Mondal, Vaneet Aggarwal, and Satish V. Ukkusuri, "Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction?," in Proc. UAI, Aug 2022.
- Washim Uddin Mondal, Vaneet Aggarwal, and Satish V. Ukkusuri, "On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning," Transactions on Machine Learning Research, Sep 2022.
- Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V. Ukkusuri, "On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)," Journal of Machine Learning Research, vol. 23 no. 129, pp.1-46, Mar 2022.
- Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V. Ukkusuri, "On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)," in Proc. Neurips Workshop on Cooperative AI, Dec. 2021 (Best paper award).
Decentralized Multi-Agent Reinforcement Learning:
Value function factorization via centralized training and decentralized execution is promising for solving cooperative multi-agent reinforcement tasks. One of
the approaches in this area, QMIX, has become state-of-the-art and achieved the
best performance on the StarCraft II micromanagement benchmark. However, the
monotonic-mixing of per agent estimates in QMIX is known to restrict the joint
action Q-values it can represent, as well as the insufficient global state information for single agent value function estimation, often resulting in suboptimality. In order to deal with this, we use novel mechanisms for information sharing thereby improving performance.
- Hanhan Zhou, Tian Lan, Vaneet Aggarwal, "Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 5, pp. 1351-1361, Oct. 2023.
- Hanhan Zhou, Tian Lan, and Vaneet Aggarwal, "PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning," in Proc. Neurips, Dec 2022.
- Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, and Vaneet Aggarwal, "Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming," arXiv, Oct 2021
Constrained Decentralized Multi-Agent Reinforcement Learning:
In many real-world tasks, a team of learning agents must
ensure that their optimized policies collectively satisfy required peak
and average constraints, while acting in a decentralized manner. To this end, we propose novel architectures that account for constraints in multi-agent reinforcement learning.
- Nan Geng, Qinbo Bai, Chenyi Liu, Tian Lan, Vaneet Aggarwal, Yuan Yang, and Mingwei Xu, "A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints," Accepted to IEEE Transactions on Vehicular Technology (TVT), Jan 2023.
- Chenyi Liu, Nan Geng, Vaneet Aggarwal, Tian Lan, Yuan Yang and Mingwei Xu, "CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints," in Proc. ECML, Sep 2021 (21% acceptance rate, 147/685).
- Ramkumar Raghu, Pratheek Upadhyaya, Mahadesh Panju, Vaneet Aggarwal, and Vinod Sharma, "Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning," Entropy, Nov 2021; 23(12):1555.
- Ramkumar Raghu, Pratheek Upadhyaya, Mahadesh Panju, Vaneet Aggarwal, and Vinod Sharma, "Deep Reinforcement Learning Based Power control for Wireless Multicast Systems," in Proc. Allerton, Oct 2019.
Mean-Field Games:
We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. We also extend the problem to leader-follower setup with multiple leaders and multiple followers.
Home