Publication:
Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.orcid | 0000-0002-6742-6722 | |
| cris.virtual.orcid | 0000-0001-9355-6566 | |
| cris.virtual.orcid | 0000-0002-9664-9925 | |
| cris.virtual.orcid | 0000-0002-4812-4841 | |
| cris.virtual.orcid | 0000-0002-2969-3133 | |
| cris.virtual.orcid | 0000-0001-8029-4720 | |
| cris.virtualsource.department | 19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42 | |
| cris.virtualsource.department | 1cf77b59-f7f6-4d1d-af45-e08f88df7d20 | |
| cris.virtualsource.department | 803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57 | |
| cris.virtualsource.department | c51c977b-dc5a-451e-ac25-4b9f2b738719 | |
| cris.virtualsource.department | 5f457973-5b9f-4593-8a29-1eeb47f32775 | |
| cris.virtualsource.department | a09c2484-580c-4e75-992d-302438c7c31d | |
| cris.virtualsource.orcid | 19c9b1b5-069a-4cc3-a0a1-2108f2bd7c42 | |
| cris.virtualsource.orcid | 1cf77b59-f7f6-4d1d-af45-e08f88df7d20 | |
| cris.virtualsource.orcid | 803fc37f-b2c8-4b1c-9cc7-7cf9460b5a57 | |
| cris.virtualsource.orcid | c51c977b-dc5a-451e-ac25-4b9f2b738719 | |
| cris.virtualsource.orcid | 5f457973-5b9f-4593-8a29-1eeb47f32775 | |
| cris.virtualsource.orcid | a09c2484-580c-4e75-992d-302438c7c31d | |
| dc.contributor.author | Vanneste, Simon | |
| dc.contributor.author | Vanneste, Astrid | |
| dc.contributor.author | De Schepper, Tom | |
| dc.contributor.author | Mercelis, Siegfried | |
| dc.contributor.author | Hellinckx, Peter | |
| dc.contributor.author | Mets, Kevin | |
| dc.date.accessioned | 2026-06-10T10:14:16Z | |
| dc.date.available | 2026-06-10T10:14:16Z | |
| dc.date.createdwos | 2025-12-10 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Communication learning while learning a behaviour policy is a challenging problem within the multi-agent reinforcement learning domain. In this work, we combine the MACC (Multi-Agent Counterfactual Communication) method with the DR.PG (Difference Reward Policy Gradient) method and propose the novel DR.MACC (Difference Reward Multi-Agent Counterfactual Communication) method. The DR.MACC method enables us to create an agent-specific difference return for the action and communication policy of the agents. This policy-specific difference return minimizes the credit-assignment problem compared to using the team reward directly. The DR.MACC method does not require us to learn a joint Q-function, like the MACC method, but instead operates using the environment’s reward function. Alternatively, when the reward function is unavailable, we can learn an approximation of the reward function in the DRR.MACC method. Here, the agent’s environment interactions are used to train the approximation of the reward function using supervised learning. In the experiments, we compare the novel DR.MACC method against the MACC method with an individual Q-function and a joint Q-function. The results show that the DR.MACC method can outperform both MACC variants in the different environment configurations. | |
| dc.description.wosFundingText | Simon Vanneste and Astrid Vanneste are supported by the Research Foundation Flanders (FWO) under Grant Number 1S94120N and Grant Number 1S12121N respectively. | |
| dc.identifier.doi | 10.1007/978-3-031-74650-5_5 | |
| dc.identifier.isbn | 978-3-031-74649-9 | |
| dc.identifier.issn | 1865-0929 | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/59655 | |
| dc.language.iso | eng | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | SPRINGER INTERNATIONAL PUBLISHING AG | |
| dc.source.beginpage | 82 | |
| dc.source.conference | Artificial Intelligence and Machine Learning. 35th Benelux Conference, BNAIC/Benelearn | |
| dc.source.conferencedate | 2023-11-08 | |
| dc.source.conferencelocation | Delft | |
| dc.source.endpage | 100 | |
| dc.source.journal | ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, BNAIC/BENELEARN 2023 | |
| dc.source.numberofpages | 19 | |
| dc.title | Multi-Agent Counterfactual Communication Using Difference Rewards Policy Gradients | |
| dc.type | Proceedings paper | |
| dspace.entity.type | Publication | |
| imec.internal.crawledAt | 2026-04-07 | |
| imec.internal.source | crawler | |
| imec.internal.wosCreatedAt | 2026-04-07 | |
| Files | ||
| Publication available in collections: |