Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction

Hou, Yuanbo; ren, Qiaoqiao; Wang, Wenwu; Botteldooren, Dick

doi:10.1109/icassp49660.2025.10890031

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtualsource.department	952936d6-a5d1-4952-ab8c-7ba5c377af16
cris.virtualsource.orcid	952936d6-a5d1-4952-ab8c-7ba5c377af16
dc.contributor.author	Hou, Yuanbo
dc.contributor.author	ren, Qiaoqiao
dc.contributor.author	Wang, Wenwu
dc.contributor.author	Botteldooren, Dick
dc.date.accessioned	2026-06-17T10:16:16Z
dc.date.available	2026-06-17T10:16:16Z
dc.date.createdwos	2026-01-06
dc.date.issued	2025
dc.description.abstract	Emotion recognition and touch gesture decoding are crucial for advancing human-robot interaction (HRI), especially in social environments where emotional cues and tactile perception play important roles. However, many humanoid robots, such as Pepper, Nao, and Furhat, lack full-body tactile skin, limiting their ability to engage in touch-based emotional and gesture interactions. In addition, vision-based emotion recognition methods usually face strict GDPR compliance challenges due to the need to collect personal facial data. To address these limitations and avoid privacy issues, this paper studies the potential of using the sounds produced by touching during HRI to recognise tactile gestures and classify emotions along the arousal and valence dimensions. Using a dataset of tactile gestures and emotional interactions from 28 participants with the humanoid robot Pepper, we design an audio-only lightweight touch gesture and emotion recognition model with only 0.24M parameters, 0.94MB model size, and 0.7G FLOPs. Experimental results show that the proposed model effectively recognises the arousal and valence states of different emotions, as well as various tactile gestures, when the input audio length varies. The proposed model is of low-latency and achieves similar results as well-known pretrained audio neural networks (PANNs), but with much smaller FLOPs, number of parameters, and model size.
dc.description.wosFundingText	This research received funding from the Flemish Government under the "Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen" programme.
dc.identifier.doi	10.1109/icassp49660.2025.10890031
dc.identifier.isbn	979-8-3503-6875-8
dc.identifier.issn	1520-6149
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/59734
dc.language.iso	eng
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	IEEE
dc.source.conference	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.source.conferencedate	2025-04-06
dc.source.conferencelocation	Hyderabad
dc.source.journal	2025 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
dc.source.numberofpages	5
dc.subject.keywords	CIRCUMPLEX MODEL
dc.subject.keywords	NEURAL-NETWORKS
dc.subject.keywords	PERCEPTION
dc.subject.keywords	VALENCE
dc.subject.keywords	AROUSAL
dc.title	Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction
dc.type	Proceedings paper
dspace.entity.type	Publication
imec.internal.crawledAt	2026-04-07
imec.internal.source	crawler
imec.internal.wosCreatedAt	2026-04-07
Files	Original bundle Name: Sound-Based_Recognition_of_Touch_Gestures_and_Emotions_for_Enhanced_Human-Robot_Interaction.pdf Size: 8.78 MB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Conference contributions

Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction

Date