1Georgia Institute of Technology, 2Kuwait University, 3Northeastern University
This study presents an emotion-aware navigation framework -- EmoBipedNav -- using deep reinforcement learning (DRL) for bipedal robots walking in socially interactive environments. The inherent locomotion constraints of bipedal robots challenge their safe maneuvering capabilities in dynamic environments. When combined with the intricacies of social environments, including pedestrian interactions and social cues, such as emotions, these challenges become even more pronounced. To address these coupled problems, we propose a two-stage pipeline that considers both bipedal locomotion constraints and complex social environments. Specifically, social navigation scenarios are represented using sequential LiDAR grid maps (LGMs), from which we extract latent features, including collision regions, emotion-related discomfort zones, social interactions, and the spatio-temporal dynamics of evolving environments. The extracted features are directly mapped to the actions of reduced-order models (ROMs) through a DRL architecture. Furthermore, the proposed framework incorporates full-order dynamics and locomotion constraints during training, effectively accounting for tracking errors and restrictions of the locomotion controller while planning the trajectory with ROMs. Comprehensive experiments demonstrate that our approach exceeds both model-based planners and DRL-based baselines.
Our framework begins by obtaining estimated facial emotions using pre-trained CNN models. Simultaneously, we transform raw LiDAR scans into sequential pie-shape lidar grid maps (LGMs). These grid maps are converted into stacked pixel images which are further processed through an encoder constructed using convolutional neural networks (CNNs) to extract socially interactive and emotionally aware features. The resulting latent features are concatenated with the robot's last command and target position, which are fed into an actor-critic deep reinforcement learning (DRL) structure implemented with multi-layer perceptrons (MLPs). The action output from the actor network is derived from the reduce-order model (ROM) and practically applied to a bipedal robot with full-body dynamics and constraints. The torso position and yaw angle obtained from Digit correspond to the ego-agent state. We use the angular momentum linear inverted pendulum planner (ALIP) and a passivity full-body controller with ankle actuation to track the desirable ROM trajectory.
Initially, we train and test our navigation policy in a physics-based simulator MuJoCo, using a full-body bipedal robot and a full-order dynamics. Simulatioin comparisons demonstrate the effectiveness of our social navigation pipeline for bipedal robots.
We deploy our policy into a more complex simulation scenarios, further indicating that our navigation policy enables the bipedal robot to adapt to complex and interactive scenarios integrated with pedestrian emotions.
We transfer our simulated navigation policy into real-world scenarios with diverse pedestrian motion patterns. The robot should avoid static obstacles while interacting with pedestrians. We include several representative motion patterns. For example, pedestrians cross in front of the robot, walk and suddenly stop in front of the robot, group together to cross in front of the robot, and randomly interact with the robot and other pedestrians. Furthermore, we randomly set pedestrian emotions. Accordingly, our social navigation policy can well adapt to its own behaviors to avoid collisions and reduce intrusions into discomfort zones of pedestrians, indicating the potential of sim-to-real transfer of our navigation policy. In addition, we integrate robot localization, pedestrian detection, and emotion recognition into our pipeline to further validate the practicality of our emotion-aware social navigation. These modules are achieved by an on-board stereo camera, and corresponding SLAM and perception algorithms. Pedestrians’ states can change between static and moving, and their emotions also alter between happy, neutral, and negative. To investigate further, we increase the environment complexities. For example, pedestrians group together and keep static to block the robot. Moreover, we increase the pedestrian density to make the scenario more crowded.
Finally, we deploy our navigation policy into more realistic environments, like the campus after classes. Pedestrians are more crowded, and interactions are more complicated. The bipedal robot can still achieve the emotion-aware social navigation task.
@article{zhu2025emobipednav,
title={EmoBipedNav: Emotion-aware Social Navigation for Bipedal Robots with Deep Reinforcement Learning},
author={Wei Zhu, Abirath Raju, Abdulaziz Shamsah, Anqi Wu, Seth Hutchinson, and Ye Zhao},
journal={arXiv preprint arXiv:2503.12538}
year={2025},
}