Improving the learning process of deep reinforcement learning agents operating in collective heating environments

Jacobs, Stef; Ghane, Sara; Houben, Pieter Jan; Kabbara, Zakarya; Huybrechts, Thomas; Hellinckx, Peter; Verhaert, Ivan

doi:10.1016/j.apenergy.2025.125420

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	0000-0002-0159-175X
cris.virtual.orcid	0000-0002-5611-6331
cris.virtualsource.department	fd3cb8ca-82fb-4f47-9ae5-69bd1f453077
cris.virtualsource.department	87c2599c-7513-4737-aa72-0ca5d2f142c6
cris.virtualsource.orcid	fd3cb8ca-82fb-4f47-9ae5-69bd1f453077
cris.virtualsource.orcid	87c2599c-7513-4737-aa72-0ca5d2f142c6
dc.contributor.author	Jacobs, Stef
dc.contributor.author	Ghane, Sara
dc.contributor.author	Houben, Pieter Jan
dc.contributor.author	Kabbara, Zakarya
dc.contributor.author	Huybrechts, Thomas
dc.contributor.author	Hellinckx, Peter
dc.contributor.author	Verhaert, Ivan
dc.contributor.imecauthor	Ghane, Sara
dc.contributor.imecauthor	Huybrechts, Thomas
dc.contributor.orcidimec	Ghane, Sara::0000-0002-0159-175X
dc.contributor.orcidimec	Huybrechts, Thomas::0000-0002-5611-6331
dc.date.accessioned	2025-02-25T22:12:55Z
dc.date.available	2025-02-25T22:12:55Z
dc.date.issued	2025
dc.description.abstract	Deep reinforcement learning (DRL) can be used to optimise the performance of Collective Heating Systems (CHS) by reducing operational costs while ensuring thermal comfort. However, heating systems often exhibit slow responsiveness to control inputs due to thermal inertia, which delays the effects of actions such as adapting temperature set points. This delayed feedback complicates the learning process for DRL agents, as it becomes more difficult to associate specific control actions with their outcomes. To address this challenge, this study evaluates four hyperparameter schemes during training. The focus lies on schemes with varying learning rate (the rate at which weights in neural networks are adapted) and/or discount factor (the importance the DRL agent attaches to future rewards). In this respect, we introduce the GALER approach, which combines the progressive increase of the discount factor with the reduction of the learning rate throughout the training process. The effectiveness of the four learning schemes is evaluated using the actor-critic Proximal Policy Optimization (PPO) algorithm for three types of CHS with a multi-objective reward function balancing thermal comfort and energy use or operational costs. The results demonstrate that energy-based reward functions allow for limited optimisation possibilities, while the GALER scheme yields the highest potential for price-based optimisation across all considered concepts. It achieved a 3%–15% performance improvement over other successful training schemes. DRL agents trained with GALER schemes strategically anticipate on high-price times by lowering the supply temperature and vice versa. This research highlights the advantage of varying both learning rates and discount factors when training DRL agents to operate in complex multi-objective environments with slow responsiveness.
dc.description.wosFundingText	This research was supported by a PhD fellowship of the Research Foundation Flanders (FWO) [1S08624N] .
dc.identifier.doi	10.1016/j.apenergy.2025.125420
dc.identifier.issn	0306-2619
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/45259
dc.publisher	ELSEVIER SCI LTD
dc.source.beginpage	125420
dc.source.issue	15 April
dc.source.journal	APPLIED ENERGY
dc.source.numberofpages	15
dc.source.volume	384
dc.subject.keywords	BUILDING ENERGY
dc.subject.keywords	SYSTEMS
dc.title	Improving the learning process of deep reinforcement learning agents operating in collective heating environments
dc.type	Journal article
dspace.entity.type	Publication
Files
Publication available in collections:	Articles

Improving the learning process of deep reinforcement learning agents operating in collective heating environments

Date