Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc micrphone arrays

Kindt, Stijn; Thienpondt, Jenthe; Madhu, Nilesh

doi:10.1109/icassp49357.2023.10094862

Simple item page Full metadata Statistics

dc.contributor.author	Kindt, Stijn
dc.contributor.author	Thienpondt, Jenthe
dc.contributor.author	Madhu, Nilesh
dc.date.accessioned	2026-03-16T12:07:43Z
dc.date.available	2026-03-16T12:07:43Z
dc.date.createdwos	2026-02-21
dc.date.issued	2023-06-04
dc.description.abstract	For separating sources captured by ad hoc distributed microphones a key first step is assigning the microphones to the appropriate source-dominated clusters. The features used for such (blind) clustering are based on a fixed length embedding of the audio signals in a high-dimensional latent space. In previous work, the embedding was hand-engineered from the Mel frequency cepstral coefficients and their modulation-spectra. This paper argues that embedding frameworks designed explicitly for the purpose of reliably discriminating between speakers would produce more appropriate features. We propose features generated by the state-of-the-art ECAPA-TDNN speaker verification model for the clustering. We benchmark these features in terms of the subsequent signal enhancement as well as on the quality of the clustering where, further, we introduce 3 intuitive metrics for the latter. Results indicate that in contrast to the hand-engineered features, the ECAPA-TDNN-based features lead to more logical clusters and better performance in the subsequent enhancement stages- thus validating our hypothesis.
dc.description.wosFundingText	This work is supported by the Research Foundation - Flanders (FWO) under grant number G081420N and imec.ICON: BLE2AV (support from VLAIO). Partners: Imec, Televic, Cochlear, and Qorvo.
dc.identifier.doi	10.1109/icassp49357.2023.10094862
dc.identifier.issn	1520-6149
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/58834
dc.language.iso	eng
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	IEEE
dc.source.beginpage	1
dc.source.conference	IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP
dc.source.conferencedate	2023-06-04
dc.source.conferencelocation	Rhodes, Greece
dc.source.endpage	5
dc.source.journal	IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2023
dc.source.numberofpages	5
dc.title	Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc micrphone arrays
dc.type	Proceedings paper
dspace.entity.type	Publication
imec.internal.crawledAt	2026-02-23
imec.internal.source	crawler
Files	Original bundle Name: DS631_acc.pdf Size: 511.9 KB Format: Adobe Portable Document Format Description: Accepted Download
Publication available in collections:	Conference contributions

Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc micrphone arrays

Date