Publication:

IMPROVED DEEP SPEAKER LOCALIZATION AND TRACKING: REVISED TRAINING PARADIGM AND CONTROLLED LATENCY

 
dc.contributor.authorBohlender, Alexander
dc.contributor.authorRoelens, Liesbeth
dc.contributor.authorMadhu, Nilesh
dc.date.accessioned2026-03-31T07:31:24Z
dc.date.available2026-03-31T07:31:24Z
dc.date.createdwos2026-02-21
dc.date.issued2023
dc.description.abstractEven without a separate tracking algorithm, the directions of arrival (DOAs) of moving talkers can be estimated with a deep neural network (DNN) when the movement trajectories used for training allow the generalization to real signals. Previously, we proposed a framework for generating training data with time-variant source activity and sudden DOA changes. Slowly moving sources could be seen as a special case thereof, but were not explicitly modeled. In this paper, we extend this framework by using small jumps between neighboring discrete DOAs to simulate gradual movements. Further, we investigate the benefit of a latency controlled bidirectional recurrent layer in the DNN architecture, such that the required strictly limited context of future frames may still be acceptable for real-time applications. Experiments with real recordings show that the revised data generation leads to more continuous DOA paths, whereas the future context enables a quicker detection of speech onsets and offsets.
dc.description.wosFundingTextThis work is supported by the Research Foundation - Flanders (FWO) under grant numbers 11G0721N and G081420N.
dc.identifier.doi10.1109/icassp49357.2023.10095472
dc.identifier.issn1520-6149
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/58976
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherIEEE
dc.source.conferenceIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.source.conferencedate2023-06-04
dc.source.conferencelocationRhodos
dc.source.journalICASSP 2023 - 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP
dc.source.numberofpages5
dc.subject.keywordsDOA ESTIMATION
dc.title

IMPROVED DEEP SPEAKER LOCALIZATION AND TRACKING: REVISED TRAINING PARADIGM AND CONTROLLED LATENCY

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2026-02-23
imec.internal.sourcecrawler
Files
Publication available in collections: