Publication:

Quantifying the Impact of Job Placement and Routing on Network Efficiency in AI Clusters

Date

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0001-5817-7886
cris.virtual.orcid0000-0002-1428-0301
cris.virtual.orcid0000-0003-4408-6523
cris.virtual.orcid0009-0009-7048-1923
cris.virtualsource.departmentc914e7c0-7efb-4c2b-87b4-ae881ddf37db
cris.virtualsource.department891de1ef-83e1-4ca0-ae39-c3daab198fe5
cris.virtualsource.department48554e7b-ff43-44b9-9f84-0dcbd96416d7
cris.virtualsource.departmentc7e476c0-f11d-4b53-ba1c-5af4b5c7ee5f
cris.virtualsource.orcidc914e7c0-7efb-4c2b-87b4-ae881ddf37db
cris.virtualsource.orcid891de1ef-83e1-4ca0-ae39-c3daab198fe5
cris.virtualsource.orcid48554e7b-ff43-44b9-9f84-0dcbd96416d7
cris.virtualsource.orcidc7e476c0-f11d-4b53-ba1c-5af4b5c7ee5f
dc.contributor.authorVan Poucke, Dante
dc.contributor.authorColle, Didier
dc.contributor.authorPickavet, Mario
dc.contributor.authorTavernier, Wouter
dc.date.accessioned2026-04-13T12:47:12Z
dc.date.available2026-04-13T12:47:12Z
dc.date.createdwos2025-12-04
dc.date.issued2025
dc.description.abstractHigh-performance computing (HPC) clusters are essential for training large-scale AI models, yet they often suffer from severe underutilization due to network bottlenecks. This paper investigates the critical role of job placement in multi-tenant AI clusters and its impact on network performance. We propose a flow-level system model that jointly considers job placement, network topology, and routing strategy to evaluate link loads and congestion. By analyzing optimal, random and state-of-the-art placement strategies across modern network topologies, we demonstrate that placement decisions significantly influence network efficiency. Our results show that job placement cannot be ignored even under optimal routing, and that existing placement strategies are dependent on the routing strategy. This work underscores the importance of prioritizing job placement, as suboptimal placements can lead to significant performance degradation in AI workloads on HPC infrastructure.
dc.description.wosFundingTextThis work was supported by imec's AAA funding project (Net4HPC). Part of this research was funded by the Research Fund of Ghent University: "The design of deterministic and reliable communication networks" (bof/baf/4y/2024/01/887) and "Optimalisatie van netwerkmiddelen voor HPC workloads" (bof/baf/2y/2024/01/035).
dc.identifier.doi10.1145/3748273.3749208
dc.identifier.issn/
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/59066
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherASSOC COMPUTING MACHINERY
dc.source.beginpage74
dc.source.conference2nd Workshop on Networks for AI Computing, SIGCOMM 2025
dc.source.conferencedate2025-09-08
dc.source.conferencelocationCoimbra, Portugal
dc.source.endpage80
dc.source.journal2nd Workshop on Networks for AI Computing, SIGCOMM 2025
dc.source.numberofpages7
dc.title

Quantifying the Impact of Job Placement and Routing on Network Efficiency in AI Clusters

dc.typeProceedings paper
dspace.entity.typePublication
imec.internal.crawledAt2026-04-07
imec.internal.sourcecrawler
imec.internal.wosCreatedAt2026-04-07
Files

Original bundle

Name:
8890_acc.pdf
Size:
754.35 KB
Format:
Adobe Portable Document Format
Description:
Accepted
Publication available in collections: