Adaptive block-scaled GeMMs on vector processors for DNN training at the edge

Satya Murthy, Nitish; Laubeuf, Nathan; Bhattacharjee, Debjyoti; Catthoor, Francky; Verhelst, Marian

doi:10.1109/VLSI-SOC62099.2024.10767806

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	0000-0003-3495-9263
cris.virtual.orcid	0000-0002-3599-8515
cris.virtual.orcid	0000-0003-0181-8069
cris.virtual.orcid	0000-0002-1592-755X
cris.virtual.orcid	0000-0001-6561-8934
cris.virtualsource.department	c84426b5-5f84-48ba-9153-1fe96862af32
cris.virtualsource.department	7a992f6f-feea-493d-b4d8-c297450cff52
cris.virtualsource.department	91857424-b227-471d-aaae-198ffad1e716
cris.virtualsource.department	15e57581-19c6-4927-9cf5-286e171d9d9e
cris.virtualsource.department	873d5ca3-d769-441b-b014-52f18a2fd1c0
cris.virtualsource.orcid	c84426b5-5f84-48ba-9153-1fe96862af32
cris.virtualsource.orcid	7a992f6f-feea-493d-b4d8-c297450cff52
cris.virtualsource.orcid	91857424-b227-471d-aaae-198ffad1e716
cris.virtualsource.orcid	15e57581-19c6-4927-9cf5-286e171d9d9e
cris.virtualsource.orcid	873d5ca3-d769-441b-b014-52f18a2fd1c0
dc.contributor.author	Satya Murthy, Nitish
dc.contributor.author	Laubeuf, Nathan
dc.contributor.author	Bhattacharjee, Debjyoti
dc.contributor.author	Catthoor, Francky
dc.contributor.author	Verhelst, Marian
dc.date.accessioned	2026-01-14T10:57:32Z
dc.date.available	2026-01-14T10:57:32Z
dc.date.issued	2024
dc.description.abstract	Reduced precision datatypes have become essential to the efficient training and deployment of Deep Neural Networks (DNNs). A recent development in the field has been the emergence of block-scaled datatypes: tensor representation formats derived from floating-point, that share a common exponent across multiple elements. While these formats are being broadly adopted and optimised for by DNN-specific inference accelerators, the potential benefits for training workloads on general-purpose (GP) vector processors has yet to be thoroughly explored. This work proposes a benchmarked implementation of block-scaled general matrix multiplications (GeMM) for DNN training at the edge using commercially available vector instruction sets (ARM SVE). Using this implementation, we highlight an accuracy-speed trade-off involving the shape of shared exponent blocks - vectors or squares. We exploit this result to optimize the training of fully connected networks by dynamically adapting the shared exponent block shapes during training. This strategy yields on average around 1.95x faster training with 2x lower memory footprint compared to standard IEEE 32-bit floating point (FP32), while achieving similar accuracy.
dc.identifier	10.1109/VLSI-SOC62099.2024.10767806
dc.identifier.doi	10.1109/VLSI-SOC62099.2024.10767806
dc.identifier.isbn	979-8-3315-3967-2
dc.identifier.issn	2324-8440
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/58644
dc.language.iso	en
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	IEEE
dc.relation.ispartof	2024 IFIP/IEEE 32ND INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC
dc.relation.ispartofseries	2024 IFIP/IEEE 32ND INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC
dc.source.beginpage	N/A
dc.source.conference	IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)
dc.source.conferencedate	2024-10-06
dc.source.conferencelocation	Tanger
dc.source.journal	IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)
dc.subject	DNN training
dc.subject	Vector processors
dc.subject	Block-scaled datatypes
dc.subject	ARM SVE ISA
dc.subject	Science & Technology
dc.subject	Technology
dc.title	Adaptive block-scaled GeMMs on vector processors for DNN training at the edge
dc.type	Proceedings paper
dspace.entity.type	Publication
oaire.citation.edition	WOS.ISTP
person.identifier.rid	E-5739-2011
person.identifier.rid	W-6287-2019
Files
Publication available in collections:	Conference contributions

Adaptive block-scaled GeMMs on vector processors for DNN training at the edge

Date