I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Abbo, Giulio Antonio; Belpaeme, Tony

doi:10.1109/hri61500.2025.10973830

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	0000-0001-6301-0028
cris.virtual.orcid	0000-0001-5207-7745
cris.virtualsource.department	ab1b156b-2cca-4ddc-bdb9-155273f95966
cris.virtualsource.department	6c1aac4b-593e-4f80-9ecc-911fd20f3c31
cris.virtualsource.orcid	ab1b156b-2cca-4ddc-bdb9-155273f95966
cris.virtualsource.orcid	6c1aac4b-593e-4f80-9ecc-911fd20f3c31
dc.contributor.author	Abbo, Giulio Antonio
dc.contributor.author	Belpaeme, Tony
dc.date.accessioned	2026-04-14T10:36:26Z
dc.date.available	2026-04-14T10:36:26Z
dc.date.createdwos	2025-10-30
dc.date.issued	2025
dc.description.abstract	In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package.
dc.description.wosFundingText	Funded by Horizon Europe VALAWAI (grant agreement 101070930).
dc.identifier.doi	10.1109/hri61500.2025.10973830
dc.identifier.isbn	979-8-3503-7894-8
dc.identifier.issn	2167-2121
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/59086
dc.language.iso	eng
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	IEEE
dc.source.beginpage	1176
dc.source.conference	2025 20th ACM/IEEE International Conference on Human-Robot Interaction, HRI
dc.source.conferencedate	2025-03-04
dc.source.conferencelocation	Melbourne, Australia
dc.source.endpage	1180
dc.source.journal	2025 20th ACM/IEEE International Conference on Human-Robot Interaction, HRI
dc.source.numberofpages	5
dc.title	I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
dc.type	Proceedings paper
dspace.entity.type	Publication
imec.internal.crawledAt	2026-04-07
imec.internal.source	crawler
imec.internal.wosCreatedAt	2026-04-07
Files	Original bundle Name: DS917_acc.pdf Size: 976.15 KB Format: Adobe Portable Document Format Description: Accepted Download
Publication available in collections:	Conference contributions

I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Date