Publication:

Evaluating the Network Effects of Orchestration Strategies for AI Workloads in Modern Data Centers

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0003-2618-3311
cris.virtual.orcid0000-0003-4824-1199
cris.virtual.orcid0000-0002-6276-2057
cris.virtualsource.departmentcc837ec8-2eb7-46b6-90d8-480d745c3fcc
cris.virtualsource.department505a9fa2-2261-4859-8c77-73c2ba21244c
cris.virtualsource.department5fc1041b-34c9-4bdb-ba41-1a986f0c4c25
cris.virtualsource.orcidcc837ec8-2eb7-46b6-90d8-480d745c3fcc
cris.virtualsource.orcid505a9fa2-2261-4859-8c77-73c2ba21244c
cris.virtualsource.orcid5fc1041b-34c9-4bdb-ba41-1a986f0c4c25
dc.contributor.authorSantos, Jose
dc.contributor.authorManiotis, Pavlos
dc.contributor.authorWang, Chen
dc.contributor.authorTantawi, Asser
dc.contributor.authorTardieu, Olivier
dc.contributor.authorWauters, Tim
dc.contributor.authorDe Turck, Filip
dc.date.accessioned2026-04-20T08:14:17Z
dc.date.available2026-04-20T08:14:17Z
dc.date.createdwos2025-10-22
dc.date.issued2025
dc.description.abstractThe exponential growth in Artificial Intelligence (AI) adoption presents unique challenges and opportunities for deploying AI workloads in modern Data Center (DC) networks, particularly in terms of performance, scalability, and reliability. AI workloads, such as inference and distributed training, impose different network demands: inference is primarily computebound and typically requires low network latency, while distributed training is network-bound and requires high bandwidth, placing significant strain on the network. This paper focuses on the network requirements of widely known AI communication patterns, and studies their impact on modern DC architectures by analyzing the effects of different orchestration strategies-specifically packing and spreading-on throughput, response time, and network congestion. The results show that packing strategies generally deliver higher performance for most covered AI collectives. However, spreading strategies can be beneficial in certain scenarios, such as when larger workloads span across higher number of racks, as they can help mitigate network congestion between the switches of leaf-spine network configurations. This paper offers valuable insights into optimizing the orchestration of popular AI collectives in data center networks, presenting informed strategies to improve performance in response to growing AI demands, with findings demonstrating completion time reductions of up to 30 %.
dc.description.wosFundingTextJose Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N. This work was performed during an internship at IBM Research, Yorktown Heights, NY, USA with financial support from the Research Foundation Flanders (FWO).
dc.identifier.doi10.1109/netsoft64993.2025.11080575
dc.identifier.isbn979-8-3315-4346-4
dc.identifier.issn2693-9770
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/59115
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherIEEE
dc.source.beginpage285
dc.source.conferenceIEEE 11th International Conference on Network Softwarization, NETSOFT
dc.source.conferencedate2025-06-23
dc.source.conferencelocationBudapest, Hungary
dc.source.endpage293
dc.source.journalIEEE 11th International Conference on Network Softwarization, NETSOFT
dc.source.numberofpages9
dc.title

Evaluating the Network Effects of Orchestration Strategies for AI Workloads in Modern Data Centers

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2026-04-07
imec.internal.sourcecrawler
imec.internal.wosCreatedAt2026-04-07
Files

Original bundle

Name:
8841_acc.pdf
Size:
523.35 KB
Format:
Adobe Portable Document Format
Description:
Accepted
Publication available in collections: