NOTE: The following materials are presented for timely
dissemination of academic and technical work. Copyright and all other rights
therein are reserved by authors and/or other copyright holders. Persoanl
use of the following materials is permitted and, however, people using
the materials or information are expected to adhere to the terms and
constraints invoked by the related copyright.
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning
ABSTRACT
Reinforcement learning (RL) often struggles to accomplish a sparsereward
long-horizon task in a complex environment. Goal-conditioned
reinforcement learning (GCRL) has been employed to tackle this difficult
problem via a curriculum of easy-to-reach sub-goals. In GCRL,
exploring novel sub-goals is essential for the agent to ultimately find
the pathway to the desired goal. How to explore novel sub-goals efficiently
is one of the most challenging issues in GCRL. Several goal
exploration methods have been proposed to address this issue but still
struggle to find the desired goals efficiently. In this paper, we propose
a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in subgoal
selection based GCRL. To optimize this objective, we first explore
and exploit the frequently occurring goal-transition patterns mined in
the environments similar to the current task to compose skills via skill
learning. Then, the pre-trained skills are applied in goal exploration
with theoretical justification. Evaluation on a variety of spare-reward
long-horizon benchmark tasks suggests that incorporating our method
into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance.
Click
ml2023.pdf
for full text