Leveraging Language for Accelerated Learning of Tool Manipulation

Allen Z. Ren Bharat Govil Tsung-Yen Yang Karthik Narasimhan Anirudha Majumdar

Princeton University Conference on Robot Learning (CoRL), 2022

Paper | Video | Bibtex

We propose ATLA — Accelerated Learning of Tool Manipulation with LAnguage — a meta-learning framework leveraging large language models (LLMs) to accelerate learning of tool manipulation skills.

Method Overview

Natural language descriptions of tools contain information about the affordances of the tools, how to exploit these affordances for a given task, and how perceptual features of tools (e.g., their visual appearance and geometry) relate to their affordances. Moreover, language can help capture the shared structure of tools and their affordances. Thus, if one has previously learned to use a set of tools (with corresponding language descriptions), a description of a new tool can help to exploit this prior knowledge in order to accelerate learning.

ATLA utilizes LLMs to generate such language descriptions for tools and to obtain the corresponding feature representations for policy learning.

ATLA consists of two phases. At meta-training time, the metalearner updates a base-learner that quickly fine-tunes a manipulation policy; this fine-tuning process is conditioned on the LLM representations corresponding to the language descriptions of each tool. At test time, the base-learner adapts to a new tool using its language descriptions and interactions with it.

At training time, first, we prompt OpenAI GPT-3 to obtain a diverse set of language descriptions of the tool. Then for each episode collected, we sample a language description randomly from the set, which is then fed into a pre-trained BERT model to obtain the representation. The language head further distills the language information. We concatenate the representations from the language head and the image encoder, and then the features are shared by the critic head and the actor head. We train the language-condtioned policy using off-policy RL algorithm.

Experiment Results

Through a few iterations of adaptation at test time, ATLA-LA generally outperforms AT (ATLA without language information) and SAC-LA (ATLA without meta-learning) when using different tools for the four tasks. Adaptaion curves of SAC-LA tend to stagnate or fluctuate while those of ATLA tend to rise steadily. This indicates meta-learning trains the policy to better adapt to new tools after training.

(a) Language descriptions of a crowbar often contain phrases including “long and thin bar”, “curved”, “hook”, “used to leverage”, and “used to pry open things”. Withthe descriptions, ATLA (orange curve) enables the policy to adapt quickly to this tool unseen during meta-training — the policy learns to use the curved hook to better steer the cylinder towards the target. As a comparison, we replace the descriptions with only the sentence “A crowbar is a long and thin bar,” and the policy (green curve) does not adapt as well. (b) One common feature among tools is the handle. While ATLA learns to grasp at the handle of a trowel (top), when we remove “handle” from all the descriptions, the robot fails to grip firmly on the handle and loses the grip eventually (bottom).

Acknowledgements

The authors were partially supported by the Toyota Research Institute (TRI), the NSF CAREER Award [#2044149], the Office of Naval Research [N00014-21-1-2803, N00014-18-1-2873], and the School of Engineering and Applied Science at Princeton University through the generosity of William Addy ’82. This article solely reflects th opinions and conclusions of its authors and not NSF, ONR, Princeton SEAS, TRI or any other Toyota entity.