Multi-Agent Reinforcement Learning:


Deep Relocating Options:

Covering option discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. In order to alleviate this problem, we design efficient approaches to make multi-agent deep covering options scalable.


Federated/Parallel Reinforcement Learning:

We consider M parallel reinforcement learning agents, and wish to achieve the same sample complexity/regret as if they are fully collaborative. We show this to be the case with only logarithmic number of rounds of communication for model-based reinforcement learning. For the special case of bandits, we show that not only is the amount of communication rounds logarithmic in the time, the agents only need to share the best arm index thus achieving better privacy among agents. For the model-free setups, we consider policy gradient based approach and show that the the communication complexity for natural policy gradient can be decreased significantly using ADMM based methods.


Approximation of MARL by Mean Field Control:

Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. We consider that multiple heterogeneous agents that can be segregated into K classes, and find the approximation gap between the MARL and MFC problems as the number of agents increase. Further, we design a Natural Policy Gradient (NPG) based algorithm that achieves approximate optimality for heterogenous MARL.


Decentralized Multi-Agent Reinforcement Learning:

Value function factorization via centralized training and decentralized execution is promising for solving cooperative multi-agent reinforcement tasks. One of the approaches in this area, QMIX, has become state-of-the-art and achieved the best performance on the StarCraft II micromanagement benchmark. However, the monotonic-mixing of per agent estimates in QMIX is known to restrict the joint action Q-values it can represent, as well as the insufficient global state information for single agent value function estimation, often resulting in suboptimality. In order to deal with this, we use novel mechanisms for information sharing thereby improving performance.


Constrained Decentralized Multi-Agent Reinforcement Learning:

In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. To this end, we propose novel architectures that account for constraints in multi-agent reinforcement learning.


Mean-Field Games:

We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. We also extend the problem to leader-follower setup with multiple leaders and multiple followers.

Home