Publication:

Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0002-6742-6722
cris.virtual.orcid0000-0001-9355-6566
cris.virtual.orcid0000-0002-9664-9925
cris.virtual.orcid0000-0002-4812-4841
cris.virtual.orcid0000-0002-2969-3133
cris.virtual.orcid0000-0001-8029-4720
cris.virtualsource.department19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42
cris.virtualsource.department1cf77b59-f7f6-4d1d-af45-e08f88df7d20
cris.virtualsource.department803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57
cris.virtualsource.departmentc51c977b-dc5a-451e-ac25-4b9f2b738719
cris.virtualsource.department5f457973-5b9f-4593-8a29-1eeb47f32775
cris.virtualsource.departmenta09c2484-580c-4e75-992d-302438c7c31d
cris.virtualsource.orcid19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42
cris.virtualsource.orcid1cf77b59-f7f6-4d1d-af45-e08f88df7d20
cris.virtualsource.orcid803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57
cris.virtualsource.orcidc51c977b-dc5a-451e-ac25-4b9f2b738719
cris.virtualsource.orcid5f457973-5b9f-4593-8a29-1eeb47f32775
cris.virtualsource.orcida09c2484-580c-4e75-992d-302438c7c31d
dc.contributor.authorVanneste, Simon
dc.contributor.authorVanneste, Astrid
dc.contributor.authorDe Schepper, Tom
dc.contributor.authorMercelis, Siegfried
dc.contributor.authorHellinckx, Peter
dc.contributor.authorMets, Kevin
dc.date.accessioned2026-06-10T10:14:16Z
dc.date.available2026-06-10T10:14:16Z
dc.date.createdwos2025-12-10
dc.date.issued2025
dc.description.abstractCommunication learning while learning a behaviour policy is a challenging problem within the multi-agent reinforcement learning domain. In this work, we combine the MACC (Multi-Agent Counterfactual Communication) method with the DR.PG (Difference Reward Policy Gradient) method and propose the novel DR.MACC (Difference Reward Multi-Agent Counterfactual Communication) method. The DR.MACC method enables us to create an agent-specific difference return for the action and communication policy of the agents. This policy-specific difference return minimizes the credit-assignment problem compared to using the team reward directly. The DR.MACC method does not require us to learn a joint Q-function, like the MACC method, but instead operates using the environment’s reward function. Alternatively, when the reward function is unavailable, we can learn an approximation of the reward function in the DRR.MACC method. Here, the agent’s environment interactions are used to train the approximation of the reward function using supervised learning. In the experiments, we compare the novel DR.MACC method against the MACC method with an individual Q-function and a joint Q-function. The results show that the DR.MACC method can outperform both MACC variants in the different environment configurations.
dc.description.wosFundingTextSimon Vanneste and Astrid Vanneste are supported by the Research Foundation Flanders (FWO) under Grant Number 1S94120N and Grant Number 1S12121N respectively.
dc.identifier.doi10.1007/978-3-031-74650-5_5
dc.identifier.isbn978-3-031-74649-9
dc.identifier.issn1865-0929
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/59655
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherSPRINGER INTERNATIONAL PUBLISHING AG
dc.source.beginpage82
dc.source.conferenceArtificial Intelligence and Machine Learning. 35th Benelux Conference, BNAIC/Benelearn
dc.source.conferencedate2023-11-08
dc.source.conferencelocationDelft
dc.source.endpage100
dc.source.journalARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, BNAIC/BENELEARN 2023
dc.source.numberofpages19
dc.title

Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2026-04-07
imec.internal.sourcecrawler
imec.internal.wosCreatedAt2026-04-07
Files
Publication available in collections: