Publication:
Adaptive block-scaled GeMMs on vector processors for DNN training at the edge
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.orcid | 0000-0003-3495-9263 | |
| cris.virtual.orcid | 0000-0002-3599-8515 | |
| cris.virtual.orcid | 0000-0003-0181-8069 | |
| cris.virtual.orcid | 0000-0002-1592-755X | |
| cris.virtual.orcid | 0000-0001-6561-8934 | |
| cris.virtualsource.department | c84426b5-5f84-48ba-9153-1fe96862af32 | |
| cris.virtualsource.department | 7a992f6f-feea-493d-b4d8-c297450cff52 | |
| cris.virtualsource.department | 91857424-b227-471d-aaae-198ffad1e716 | |
| cris.virtualsource.department | 15e57581-19c6-4927-9cf5-286e171d9d9e | |
| cris.virtualsource.department | 873d5ca3-d769-441b-b014-52f18a2fd1c0 | |
| cris.virtualsource.orcid | c84426b5-5f84-48ba-9153-1fe96862af32 | |
| cris.virtualsource.orcid | 7a992f6f-feea-493d-b4d8-c297450cff52 | |
| cris.virtualsource.orcid | 91857424-b227-471d-aaae-198ffad1e716 | |
| cris.virtualsource.orcid | 15e57581-19c6-4927-9cf5-286e171d9d9e | |
| cris.virtualsource.orcid | 873d5ca3-d769-441b-b014-52f18a2fd1c0 | |
| dc.contributor.author | Satya Murthy, Nitish | |
| dc.contributor.author | Laubeuf, Nathan | |
| dc.contributor.author | Bhattacharjee, Debjyoti | |
| dc.contributor.author | Catthoor, Francky | |
| dc.contributor.author | Verhelst, Marian | |
| dc.date.accessioned | 2026-01-14T10:57:32Z | |
| dc.date.available | 2026-01-14T10:57:32Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | Reduced precision datatypes have become essential to the efficient training and deployment of Deep Neural Networks (DNNs). A recent development in the field has been the emergence of block-scaled datatypes: tensor representation formats derived from floating-point, that share a common exponent across multiple elements. While these formats are being broadly adopted and optimised for by DNN-specific inference accelerators, the potential benefits for training workloads on general-purpose (GP) vector processors has yet to be thoroughly explored. This work proposes a benchmarked implementation of block-scaled general matrix multiplications (GeMM) for DNN training at the edge using commercially available vector instruction sets (ARM SVE). Using this implementation, we highlight an accuracy-speed trade-off involving the shape of shared exponent blocks - vectors or squares. We exploit this result to optimize the training of fully connected networks by dynamically adapting the shared exponent block shapes during training. This strategy yields on average around 1.95x faster training with 2x lower memory footprint compared to standard IEEE 32-bit floating point (FP32), while achieving similar accuracy. | |
| dc.identifier | 10.1109/VLSI-SOC62099.2024.10767806 | |
| dc.identifier.doi | 10.1109/VLSI-SOC62099.2024.10767806 | |
| dc.identifier.isbn | 979-8-3315-3967-2 | |
| dc.identifier.issn | 2324-8440 | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/58644 | |
| dc.language.iso | en | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | IEEE | |
| dc.relation.ispartof | 2024 IFIP/IEEE 32ND INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC | |
| dc.relation.ispartofseries | 2024 IFIP/IEEE 32ND INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC | |
| dc.source.beginpage | N/A | |
| dc.source.conference | IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC) | |
| dc.source.conferencedate | 2024-10-06 | |
| dc.source.conferencelocation | Tanger | |
| dc.source.journal | IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC) | |
| dc.subject | DNN training | |
| dc.subject | Vector processors | |
| dc.subject | Block-scaled datatypes | |
| dc.subject | ARM SVE ISA | |
| dc.subject | Science & Technology | |
| dc.subject | Technology | |
| dc.title | Adaptive block-scaled GeMMs on vector processors for DNN training at the edge | |
| dc.type | Proceedings paper | |
| dspace.entity.type | Publication | |
| oaire.citation.edition | WOS.ISTP | |
| person.identifier.rid | E-5739-2011 | |
| person.identifier.rid | W-6287-2019 | |
| Files | ||
| Publication available in collections: |