Publication:
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
| dc.contributor.author | Abbo, Giulio Antonio | |
| dc.contributor.author | Belpaeme, Tony | |
| dc.date.accessioned | 2026-03-24T10:51:51Z | |
| dc.date.available | 2026-03-24T10:51:51Z | |
| dc.date.createdwos | 2025-10-29 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package. | |
| dc.description.wosFundingText | Funded by Horizon Europe VALAWAI (grant agreement 101070930). | |
| dc.identifier.doi | 10.1109/HRI61500.2025.10973830 | |
| dc.identifier.isbn | 979-8-3503-7894-8 | |
| dc.identifier.issn | 2167-2121 | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/58926 | |
| dc.language.iso | eng | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | IEEE | |
| dc.source.beginpage | 1176 | |
| dc.source.conference | 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI) | |
| dc.source.conferencedate | 2025-03-04 | |
| dc.source.conferencelocation | Melbourne | |
| dc.source.endpage | 1180 | |
| dc.source.journal | 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI | |
| dc.source.numberofpages | 5 | |
| dc.title | I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots | |
| dc.type | Proceedings paper | |
| dspace.entity.type | Publication | |
| imec.internal.crawledAt | 2025-10-22 | |
| imec.internal.source | crawler | |
| Files | ||
| Publication available in collections: |