Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023)
Abstract
Reinforcement learning (RL) algorithms can help solve continuous control tasks in robotics. Model-based RL in particular has shown promise for these applications. It can be orders of magnitude more sample-efficient than its model-free counter-part by utilizing Deep Neural Network (DNN) based dynamics models. Furthermore, model-based methods are more robust and more generalizable due to being reward-agnostic. However, the computational complexities involved in planning and control in model-based RL are much higher, causing challenges in real-time deployment on low-resource hardware at the edge. In our work, we focus on reducing the computational footprint of the dynamics models used in model-based RL. To make the algorithm more hardware-efficient, we introduce block floating point data types for DNNs with optimized block performance and different mixed-precision network configurations. The performance impact of these optimizations is assessed across benchmark continuous robotics control tasks. A total memory savings of
can be achieved over conventional FP32 networks while reaching comparable rewards in most scenarios.