Publication:
Quantifying the Impact of Job Placement and Routing on Network Efficiency in AI Clusters
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.orcid | 0000-0001-5817-7886 | |
| cris.virtual.orcid | 0000-0002-1428-0301 | |
| cris.virtual.orcid | 0000-0003-4408-6523 | |
| cris.virtual.orcid | 0009-0009-7048-1923 | |
| cris.virtualsource.department | c914e7c0-7efb-4c2b-87b4-ae881ddf37db | |
| cris.virtualsource.department | 891de1ef-83e1-4ca0-ae39-c3daab198fe5 | |
| cris.virtualsource.department | 48554e7b-ff43-44b9-9f84-0dcbd96416d7 | |
| cris.virtualsource.department | c7e476c0-f11d-4b53-ba1c-5af4b5c7ee5f | |
| cris.virtualsource.orcid | c914e7c0-7efb-4c2b-87b4-ae881ddf37db | |
| cris.virtualsource.orcid | 891de1ef-83e1-4ca0-ae39-c3daab198fe5 | |
| cris.virtualsource.orcid | 48554e7b-ff43-44b9-9f84-0dcbd96416d7 | |
| cris.virtualsource.orcid | c7e476c0-f11d-4b53-ba1c-5af4b5c7ee5f | |
| dc.contributor.author | Van Poucke, Dante | |
| dc.contributor.author | Colle, Didier | |
| dc.contributor.author | Pickavet, Mario | |
| dc.contributor.author | Tavernier, Wouter | |
| dc.date.accessioned | 2026-04-13T12:47:12Z | |
| dc.date.available | 2026-04-13T12:47:12Z | |
| dc.date.createdwos | 2025-12-04 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | High-performance computing (HPC) clusters are essential for training large-scale AI models, yet they often suffer from severe underutilization due to network bottlenecks. This paper investigates the critical role of job placement in multi-tenant AI clusters and its impact on network performance. We propose a flow-level system model that jointly considers job placement, network topology, and routing strategy to evaluate link loads and congestion. By analyzing optimal, random and state-of-the-art placement strategies across modern network topologies, we demonstrate that placement decisions significantly influence network efficiency. Our results show that job placement cannot be ignored even under optimal routing, and that existing placement strategies are dependent on the routing strategy. This work underscores the importance of prioritizing job placement, as suboptimal placements can lead to significant performance degradation in AI workloads on HPC infrastructure. | |
| dc.description.wosFundingText | This work was supported by imec's AAA funding project (Net4HPC). Part of this research was funded by the Research Fund of Ghent University: "The design of deterministic and reliable communication networks" (bof/baf/4y/2024/01/887) and "Optimalisatie van netwerkmiddelen voor HPC workloads" (bof/baf/2y/2024/01/035). | |
| dc.identifier.doi | 10.1145/3748273.3749208 | |
| dc.identifier.issn | / | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/59066 | |
| dc.language.iso | eng | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | ASSOC COMPUTING MACHINERY | |
| dc.source.beginpage | 74 | |
| dc.source.conference | 2nd Workshop on Networks for AI Computing, SIGCOMM 2025 | |
| dc.source.conferencedate | 2025-09-08 | |
| dc.source.conferencelocation | Coimbra, Portugal | |
| dc.source.endpage | 80 | |
| dc.source.journal | 2nd Workshop on Networks for AI Computing, SIGCOMM 2025 | |
| dc.source.numberofpages | 7 | |
| dc.title | Quantifying the Impact of Job Placement and Routing on Network Efficiency in AI Clusters | |
| dc.type | Proceedings paper | |
| dspace.entity.type | Publication | |
| imec.internal.crawledAt | 2026-04-07 | |
| imec.internal.source | crawler | |
| imec.internal.wosCreatedAt | 2026-04-07 | |
| Files | Original bundle
| |
| Publication available in collections: |