Publication:
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.orcid | 0000-0001-6301-0028 | |
| cris.virtual.orcid | 0000-0001-5207-7745 | |
| cris.virtualsource.department | ab1b156b-2cca-4ddc-bdb9-155273f95966 | |
| cris.virtualsource.department | 6c1aac4b-593e-4f80-9ecc-911fd20f3c31 | |
| cris.virtualsource.orcid | ab1b156b-2cca-4ddc-bdb9-155273f95966 | |
| cris.virtualsource.orcid | 6c1aac4b-593e-4f80-9ecc-911fd20f3c31 | |
| dc.contributor.author | Abbo, Giulio Antonio | |
| dc.contributor.author | Belpaeme, Tony | |
| dc.date.accessioned | 2026-04-14T10:36:26Z | |
| dc.date.available | 2026-04-14T10:36:26Z | |
| dc.date.createdwos | 2025-10-30 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package. | |
| dc.description.wosFundingText | Funded by Horizon Europe VALAWAI (grant agreement 101070930). | |
| dc.identifier.doi | 10.1109/hri61500.2025.10973830 | |
| dc.identifier.isbn | 979-8-3503-7894-8 | |
| dc.identifier.issn | 2167-2121 | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/59086 | |
| dc.language.iso | eng | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | IEEE | |
| dc.source.beginpage | 1176 | |
| dc.source.conference | 2025 20th ACM/IEEE International Conference on Human-Robot Interaction, HRI | |
| dc.source.conferencedate | 2025-03-04 | |
| dc.source.conferencelocation | Melbourne, Australia | |
| dc.source.endpage | 1180 | |
| dc.source.journal | 2025 20th ACM/IEEE International Conference on Human-Robot Interaction, HRI | |
| dc.source.numberofpages | 5 | |
| dc.title | I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots | |
| dc.type | Proceedings paper | |
| dspace.entity.type | Publication | |
| imec.internal.crawledAt | 2026-04-07 | |
| imec.internal.source | crawler | |
| imec.internal.wosCreatedAt | 2026-04-07 | |
| Files | Original bundle
| |
| Publication available in collections: |