2025 20TH ACM/IEEE International Conference on Human-Robot Interaction, HRI
Abstract
Automatic Speech Recognition (ASR) technology has been reported to reach near-human performance in recent years, yet it continues to struggle with atypical speakers, particularly second language learners. This limitation has hindered progress in leveraging social robots for second language education, a field with significant promise. Recent advancements in Large Language Models (LLMs), which demonstrate capabilities in context understanding, common sense reasoning, and pragmatics, offer a potential solution by compensating for transcription errors introduced by ASR. This study examines whether ASR combined with an LLM can produce flowing conversation. Particularly, we look at its application in learning French as a second language by Dutch-speaking students. Through task-based interactions, where successful task completion depends on the accurate interpretation of user speech, the study evaluates the impact of LLMs on conversational outcomes. Results confirm that the performance of ASR degrades significantly for both speakers with limited proficiency and a non-English language. Nonetheless, LLMs demonstrate the ability to interpret context and sustain meaningful conversations despite suboptimal ASR outputs, high-lighting a promising path forward for the integration of these technologies in second-language education.