Poke and Strike:
Learning Task-Informed Exploration Policies

Marina Y. Aoyama1, Joao Moura1, Juan Del Aguila Ferrandis1 Sethu Vijayakumar1

1School of Informatics, The University of Edinburgh, UK

Conference on Robot Learning (CoRL) 2025

Paper Video Code

Abstract

In many dynamic robotic tasks, such as striking pucks into a goal outside the reachable workspace, the robot must first identify the relevant physical properties of the object for successful task execution, as it is unable to recover from failure or retry without human intervention. To address this challenge, we propose a task-informed exploration approach, based on reinforcement learning, that trains an exploration policy using rewards automatically generated from the sensitivity of a privileged task policy to errors in estimated properties. We also introduce an uncertainty-based mechanism to determine when to transition from exploration to task execution, ensuring sufficient property estimation accuracy with minimal exploration time. Our method achieves a 90% success rate on the striking task with an average exploration time under 1.2 seconds—significantly outperforming baselines that achieve at most 40% success or require inefficient querying and retraining in a simulator at test time. Additionally, we demonstrate that our task-informed rewards capture the relative importance of physical properties in both the striking task and the classical CartPole example. Finally, we validate our approach by demonstrating its ability to identify object properties and adjust task execution in a physical setup using the KUKA iiwa robot arm.

Method

We propose a task-informed exploration approach, based on reinforcement learning, that trains an exploration policy using rewards automatically generated from the sensitivity of a privileged task policy to errors in estimated properties. We also introduce an uncertainty-based mechanism to determine when to transition from exploration to task execution, ensuring sufficient property estimation accuracy with minimal exploration time. approach by demonstrating its ability to identify object properties and adjust task execution in a physical setup using the KUKA iiwa robot arm.

Method illustration

Results

Striking
Edge Pushing
Learned to poke and strike
Learned to rotate and push to the edge
Pucks with 3 different friction levels: 8/9
Box with eggs on 2 different sides: 8/10
Friction
CoM x

Tricking the robot

Can the robot identify the shifted weight?

Sensitivity modeling and exploration rewards generation

BibTex

@inproceedings{
      aoyama2025poke,
      title={Poke and Strike: Learning Task-Informed Exploration Policies},
      author={Marina Y. Aoyama and Joao Moura and Juan Del Aguila Ferrandis and Sethu Vijayakumar},
      booktitle={9th Annual Conference on Robot Learning},
      year={2025},
      }