Publication:

Evaluating the Network Effects of Orchestration Strategies for AI Workloads in Modern Data Centers

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0003-2618-3311
cris.virtual.orcid0000-0003-4824-1199
cris.virtual.orcid0000-0002-6276-2057
cris.virtual.orcid0009-0005-3140-3309
cris.virtualsource.departmentcc837ec8-2eb7-46b6-90d8-480d745c3fcc
cris.virtualsource.department505a9fa2-2261-4859-8c77-73c2ba21244c
cris.virtualsource.department5fc1041b-34c9-4bdb-ba41-1a986f0c4c25
cris.virtualsource.department123fea91-e69c-4fff-b7db-1c8288acc3db
cris.virtualsource.orcidcc837ec8-2eb7-46b6-90d8-480d745c3fcc
cris.virtualsource.orcid505a9fa2-2261-4859-8c77-73c2ba21244c
cris.virtualsource.orcid5fc1041b-34c9-4bdb-ba41-1a986f0c4c25
cris.virtualsource.orcid123fea91-e69c-4fff-b7db-1c8288acc3db
dc.contributor.authorPereira dos Santos, José Pedro
dc.contributor.authorManiotis, Pavlos
dc.contributor.authorWang, Chen
dc.contributor.authorTantawi, Asser
dc.contributor.authorTardieu, Olivier
dc.contributor.authorWauters, Tim
dc.contributor.authorDe Turck, Filip
dc.date.accessioned2026-03-19T16:00:38Z
dc.date.available2026-03-19T16:00:38Z
dc.date.createdwos2025-10-22
dc.date.issued2025
dc.description.abstractThe exponential growth in Artificial Intelligence (AI) adoption presents unique challenges and opportunities for deploying AI workloads in modern Data Center (DC) networks, particularly in terms of performance, scalability, and reliability. AI workloads, such as inference and distributed training, impose different network demands: inference is primarily computebound and typically requires low network latency, while distributed training is network-bound and requires high bandwidth, placing significant strain on the network. This paper focuses on the network requirements of widely known AI communication patterns, and studies their impact on modern DC architectures by analyzing the effects of different orchestration strategies-specifically packing and spreading-on throughput, response time, and network congestion. The results show that packing strategies generally deliver higher performance for most covered AI collectives. However, spreading strategies can be beneficial in certain scenarios, such as when larger workloads span across higher number of racks, as they can help mitigate network congestion between the switches of leaf-spine network configurations. This paper offers valuable insights into optimizing the orchestration of popular AI collectives in data center networks, presenting informed strategies to improve performance in response to growing AI demands, with findings demonstrating completion time reductions of up to 30 %.
dc.description.wosFundingTextJose Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N. This work was performed during an internship at IBM Research, Yorktown Heights, NY, USA with financial support from the Research Foundation Flanders (FWO).
dc.identifier.doi10.1109/NETSOFT64993.2025.11080575
dc.identifier.isbn979-8-3315-4346-4
dc.identifier.issn2693-9770
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/58894
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherIEEE
dc.source.beginpage285
dc.source.conferenceIEEE 11th International Conference on Network Softwarization (NetSoft)
dc.source.conferencedate2025-06-23
dc.source.conferencelocationBudapest
dc.source.endpage293
dc.source.journal2025 IEEE 11TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT
dc.source.numberofpages9
dc.title

Evaluating the Network Effects of Orchestration Strategies for AI Workloads in Modern Data Centers

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2025-10-22
imec.internal.sourcecrawler
Files
Publication available in collections: