Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients

Vanneste, Simon; Vanneste, Astrid; De Schepper, Tom; Mercelis, Siegfried; Hellinckx, Peter; Mets, Kevin

doi:10.1007/978-3-031-74650-5_5

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	0000-0002-6742-6722
cris.virtual.orcid	0000-0001-9355-6566
cris.virtual.orcid	0000-0002-9664-9925
cris.virtual.orcid	0000-0002-4812-4841
cris.virtual.orcid	0000-0002-2969-3133
cris.virtual.orcid	0000-0001-8029-4720
cris.virtualsource.department	19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42
cris.virtualsource.department	1cf77b59-f7f6-4d1d-af45-e08f88df7d20
cris.virtualsource.department	803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57
cris.virtualsource.department	c51c977b-dc5a-451e-ac25-4b9f2b738719
cris.virtualsource.department	5f457973-5b9f-4593-8a29-1eeb47f32775
cris.virtualsource.department	a09c2484-580c-4e75-992d-302438c7c31d
cris.virtualsource.orcid	19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42
cris.virtualsource.orcid	1cf77b59-f7f6-4d1d-af45-e08f88df7d20
cris.virtualsource.orcid	803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57
cris.virtualsource.orcid	c51c977b-dc5a-451e-ac25-4b9f2b738719
cris.virtualsource.orcid	5f457973-5b9f-4593-8a29-1eeb47f32775
cris.virtualsource.orcid	a09c2484-580c-4e75-992d-302438c7c31d
dc.contributor.author	Vanneste, Simon
dc.contributor.author	Vanneste, Astrid
dc.contributor.author	De Schepper, Tom
dc.contributor.author	Mercelis, Siegfried
dc.contributor.author	Hellinckx, Peter
dc.contributor.author	Mets, Kevin
dc.date.accessioned	2026-06-10T10:14:16Z
dc.date.available	2026-06-10T10:14:16Z
dc.date.createdwos	2025-12-10
dc.date.issued	2025
dc.description.abstract	Communication learning while learning a behaviour policy is a challenging problem within the multi-agent reinforcement learning domain. In this work, we combine the MACC (Multi-Agent Counterfactual Communication) method with the DR.PG (Difference Reward Policy Gradient) method and propose the novel DR.MACC (Difference Reward Multi-Agent Counterfactual Communication) method. The DR.MACC method enables us to create an agent-specific difference return for the action and communication policy of the agents. This policy-specific difference return minimizes the credit-assignment problem compared to using the team reward directly. The DR.MACC method does not require us to learn a joint Q-function, like the MACC method, but instead operates using the environment’s reward function. Alternatively, when the reward function is unavailable, we can learn an approximation of the reward function in the DRR.MACC method. Here, the agent’s environment interactions are used to train the approximation of the reward function using supervised learning. In the experiments, we compare the novel DR.MACC method against the MACC method with an individual Q-function and a joint Q-function. The results show that the DR.MACC method can outperform both MACC variants in the different environment configurations.
dc.description.wosFundingText	Simon Vanneste and Astrid Vanneste are supported by the Research Foundation Flanders (FWO) under Grant Number 1S94120N and Grant Number 1S12121N respectively.
dc.identifier.doi	10.1007/978-3-031-74650-5_5
dc.identifier.isbn	978-3-031-74649-9
dc.identifier.issn	1865-0929
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/59655
dc.language.iso	eng
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	SPRINGER INTERNATIONAL PUBLISHING AG
dc.source.beginpage	82
dc.source.conference	Artificial Intelligence and Machine Learning. 35th Benelux Conference, BNAIC/Benelearn
dc.source.conferencedate	2023-11-08
dc.source.conferencelocation	Delft
dc.source.endpage	100
dc.source.journal	ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, BNAIC/BENELEARN 2023
dc.source.numberofpages	19
dc.title	Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients
dc.type	Proceedings paper
dspace.entity.type	Publication
imec.internal.crawledAt	2026-04-07
imec.internal.source	crawler
imec.internal.wosCreatedAt	2026-04-07
Files
Publication available in collections:	Conference contributions

Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients

Date