Publication:

Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc micrphone arrays

 
dc.contributor.authorKindt, Stijn
dc.contributor.authorThienpondt, Jenthe
dc.contributor.authorMadhu, Nilesh
dc.date.accessioned2026-03-16T12:07:43Z
dc.date.available2026-03-16T12:07:43Z
dc.date.createdwos2026-02-21
dc.date.issued2023-06-04
dc.description.abstractFor separating sources captured by ad hoc distributed microphones a key first step is assigning the microphones to the appropriate source-dominated clusters. The features used for such (blind) clustering are based on a fixed length embedding of the audio signals in a high-dimensional latent space. In previous work, the embedding was hand-engineered from the Mel frequency cepstral coefficients and their modulation-spectra. This paper argues that embedding frameworks designed explicitly for the purpose of reliably discriminating between speakers would produce more appropriate features. We propose features generated by the state-of-the-art ECAPA-TDNN speaker verification model for the clustering. We benchmark these features in terms of the subsequent signal enhancement as well as on the quality of the clustering where, further, we introduce 3 intuitive metrics for the latter. Results indicate that in contrast to the hand-engineered features, the ECAPA-TDNN-based features lead to more logical clusters and better performance in the subsequent enhancement stages- thus validating our hypothesis.
dc.description.wosFundingTextThis work is supported by the Research Foundation - Flanders (FWO) under grant number G081420N and imec.ICON: BLE2AV (support from VLAIO). Partners: Imec, Televic, Cochlear, and Qorvo.
dc.identifier.doi10.1109/icassp49357.2023.10094862
dc.identifier.issn1520-6149
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/58834
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherIEEE
dc.source.beginpage1
dc.source.conferenceIEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP
dc.source.conferencedate2023-06-04
dc.source.conferencelocationRhodes, Greece
dc.source.endpage5
dc.source.journalIEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2023
dc.source.numberofpages5
dc.title

Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc micrphone arrays

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2026-02-23
imec.internal.sourcecrawler
Files

Original bundle

Name:
DS631_acc.pdf
Size:
511.9 KB
Format:
Adobe Portable Document Format
Description:
Accepted
Publication available in collections: