Child Speech Recognition in Human-Robot Interaction: Problem Solved?

Janssens, Ruben; Verhelst, Eva; Abbo, Giulio Antonio; ren, Qiaoqiao; Pinto Bernal, Maria Jose; Belpaeme, Tony

doi:10.1007/978-981-96-3519-1_43

Simple item page Full metadata Statistics

dc.contributor.author	Janssens, Ruben
dc.contributor.author	Verhelst, Eva
dc.contributor.author	Abbo, Giulio Antonio
dc.contributor.author	ren, Qiaoqiao
dc.contributor.author	Pinto Bernal, Maria Jose
dc.contributor.author	Belpaeme, Tony
dc.contributor.imecauthor	Janssens, Ruben
dc.contributor.imecauthor	Verhelst, Eva
dc.contributor.imecauthor	Abbo, Giulio Antonio
dc.contributor.imecauthor	Ren, Qiaoqiao
dc.contributor.imecauthor	Bernal, Maria Jose Pinto
dc.contributor.imecauthor	Belpaeme, Tony
dc.contributor.orcidimec	Janssens, Ruben::0000-0002-1790-9531
dc.contributor.orcidimec	Verhelst, Eva::0009-0009-4734-1551
dc.contributor.orcidimec	Abbo, Giulio Antonio::0000-0001-6301-0028
dc.contributor.orcidimec	Belpaeme, Tony::0000-0001-5207-7745
dc.date.accessioned	2025-08-23T03:59:02Z
dc.date.available	2025-08-23T03:59:02Z
dc.date.issued	2025
dc.description.abstract	Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children’s speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. Performance improves even more in highly structured interactions when priming models with specific phrases. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.
dc.description.wosFundingText	This research received funding from imec (Smart Education), the Flemish Government (AI Research Program) and the Horizon Europe VALAWAI project (grant agreement number 101070930). We are indebted to the authors of [9] and [8] for making the recordings and transcriptions available.
dc.identifier.doi	10.1007/978-981-96-3519-1_43
dc.identifier.eisbn	978-981-96-3519-1
dc.identifier.isbn	978-981-96-3518-4
dc.identifier.issn	2945-9133
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/46098
dc.publisher	SPRINGER-VERLAG SINGAPORE PTE LTD
dc.source.beginpage	476
dc.source.conference	16th International Conference on Social Robotics-ICSR-Empowering Humanity: The Role of Social and Collaborative Robotics in Shaping Our Future
dc.source.conferencedate	2024-10-24
dc.source.conferencelocation	Odense
dc.source.endpage	486
dc.source.journal	Social Robotics
dc.source.numberofpages	11
dc.title	Child Speech Recognition in Human-Robot Interaction: Problem Solved?
dc.type	Proceedings paper
dspace.entity.type	Publication
Files
Publication available in collections:	Conference contributions

Child Speech Recognition in Human-Robot Interaction: Problem Solved?

Date