Offline Reinforcement Learning:
Hierarchical Adversarial Inverse Reinforcement Learning:
Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex longhorizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks.
- Jiayu Chen, Dipesh Tamboli, Tian Lan, and Vaneet Aggarwal, "Multi-task Hierarchical Adversarial Inverse Reinforcement Learning," in Proc. ICML, Jul 2023.
- Jiayu Chen, Tian Lan, and Vaneet Aggarwal, "Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control," in Proc. IEEE International Conference on Robotics and Automation (ICRA), May 2023.
- Jiayu Chen, Tian Lan, and Vaneet Aggarwal, "Hierarchical Adversarial Inverse Reinforcement Learning," Accepted to IEEE Transactions on Neural Networks and Learning Systems, Aug 2023.
Variance Reduction in Offline Reinforcement Learning:
Recent work has shown that offline reinforcement learning
can be formulated as a sequence modeling problem and solved via supervised
learning with approaches such as decision transformer. While these sequence-based
methods achieve competitive results over return-to-go methods, especially on tasks
that require longer episodes or with scarce rewards, importance sampling is not
considered to correct the policy bias when dealing with off-policy data, mainly due
to the absence of behavior policy and the use of deterministic evaluation policies. To
this end, we propose DPE: an RL algorithm that blends offline sequence modeling
and offline reinforcement learning with Double Policy Estimation (DPE) in a unified
framework with statistically proven properties on variance reduction.
Home