Cooperative Warp Execution in Tensor Core for RISC-V GPGPU

Nada, Abubakr; Sarda Giuseppe Maria; Lenormand, Erwan

doi:10.1109/HPCA61900.2025.00107

Simple item page Full metadata Statistics

dc.contributor.author	Nada, Abubakr
dc.contributor.author	Sarda Giuseppe Maria
dc.contributor.author	Lenormand, Erwan
dc.contributor.imecauthor	Nada, Abubakr
dc.contributor.imecauthor	Sarda, Giuseppe Maria
dc.contributor.imecauthor	Lenormand, Erwan
dc.contributor.orcidimec	Nada, Abubakr::0009-0001-4019-9275
dc.contributor.orcidimec	Lenormand, Erwan::0000-0002-7383-6285
dc.date.accessioned	2025-08-15T03:57:02Z
dc.date.available	2025-08-15T03:57:02Z
dc.date.issued	2025
dc.description.abstract	The rise of Deep Neural Networks (DNNs) has amplified the demand for efficient computation, with General Matrix Multiply (GEMM) operations at their core. While ASICs are efficient but inflexible, GPUs, especially NVIDIA GPUs, equipped with tensor cores, provide a flexible yet high-performance solution for GEMM-based workloads. Previous research and optimizations have largely centered on NVIDIA’s architecture and programming model, which, while effective, can obscure the rationale behind certain design decisions and limit flexibility for further improvements in tensor core designs. In this paper, we present the design and integration of a tensor core into the open-source RISC-V Vortex GPGPU platform, along with a suite of intrinsics designed for GEMM kernel generation. The analysis conducted on the integration elucidates the connections between GPU system architectural parameters and tensor core configuration. We find that the tensor core is severely under-utilized in many cases and that increased compute capacity does not always imply better performance. Hence, we propose a novel technique, cooperative warp execution in tensor core, which leverages hardware-supported warp cooperation within the tensor core to reduce memory requirements for GEMM operations and boost performance over the baseline tensor core implementation by up to 3x.
dc.description.wosFundingText	We would like to thank the reviewers who provided us with valuable feedback and our shepherd for their guidance throughout the review period. This project was partially funded by the Flanders AI Research Program.
dc.identifier.doi	10.1109/HPCA61900.2025.00107
dc.identifier.eisbn	979-8-3315-0647-6
dc.identifier.isbn	979-8-3315-0648-3
dc.identifier.issn	1530-0897
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/46074
dc.publisher	IEEE COMPUTER SOC
dc.source.beginpage	1422
dc.source.conference	2025 International Symposium on High Performance Computer Architecture-HPCA-Annual
dc.source.conferencedate	2025-03-01
dc.source.conferencelocation	Las Vegas
dc.source.endpage	1436
dc.source.journal	2025 IEEE International Symposium on High Performance Computer Architecture (HPCA)
dc.source.numberofpages	15
dc.title	Cooperative Warp Execution in Tensor Core for RISC-V GPGPU
dc.type	Proceedings paper
dspace.entity.type	Publication
Files	Original bundle Name: Cooperative_Warp_Execution_in_Tensor_Core_for_RISC-V_GPGPU.pdf Size: 793.11 KB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Conference contributions

Cooperative Warp Execution in Tensor Core for RISC-V GPGPU

Date