Optimizing Decision Parameters of Humanoid Robots using Deep Reinforcement Learning

Richard Pufe

doi:10.60643/urai.v2025p29

, Articles

Optimizing Decision Parameters of Humanoid Robots using Deep Reinforcement Learning

Articles

https://doi.org/10.60643/urai.v2025p29

Published 27.03.2026

Richard Pufe⁺⁻

Richard Pufe

Offenburg University of Applied Sciences

PDF

Keywords

Deep Reinforcement Learning
Behavior Switching
Humanoid Robots

Abstract

This work investigates the use of deep reinforcement learning to enable humanoid Nao robots in the RoboCup 3D Soccer Simulation to autonomously decide when to switch between complex behaviors. Two main experiments were conducted. In the first, an agent was trained to learn the optimal moment to transition from walking towards the ball to executing a kick. The robot was randomly initialized at varying distances and orientations relative to the ball and trained using Proximal Policy Optimization to maximize accuracy in kicking the ball towards a target after approaching it. The resulting models achieved strong performance on par with the handcrafted baseline in simulated matches. The second experiment extended this setup by allowing the agent to also determine a favorable pre-kick position round the ball before deciding to switch. Despite the richer decision space, the resulting models performed significantly worse than the baseline, indicating the increased difficulty of jointly learning spatial positioning and timing.

https://doi.org/10.60643/urai.v2025p29

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.