Approximating vision transformers for edge: variational inference and mixed-precision for multi-modal data

Katare, Dewant; Leroux, Sam; Janssen, Marijn; Ding, Aaron Yi

doi:10.1007/s00607-025-01427-w

Simple item page Full metadata Statistics

dc.contributor.author	Katare, Dewant
dc.contributor.author	Leroux, Sam
dc.contributor.author	Janssen, Marijn
dc.contributor.author	Ding, Aaron Yi
dc.contributor.imecauthor	Leroux, Sam
dc.contributor.orcidimec	Leroux, Sam::0000-0003-3792-5026
dc.date.accessioned	2025-02-20T08:42:28Z
dc.date.available	2025-02-19T21:37:37Z
dc.date.available	2025-02-20T08:42:28Z
dc.date.issued	2025
dc.description.abstract	Vision transformer (ViTs) models have shown higher accuracy, robustness and large volume data processing ability, creating new baselines and references for perception tasks. However, these advantages require large memory and high-performance processors and computing units, which makes model adaptability and deployment challenging within resource-constrained environments such as memory-restricted and battery-powered edge devices. This paper addresses the model deployment challenges by proposing a model approximation approach VI-ViT, for edge deployment using variational inference with mixed precision for processing multi-modalities, such as point clouds and images. Our experimental evaluation on the nuScenes and Waymo datasets show up to 37% and 31% reduction in model parameters and Flops while maintaining a mean average precision of 70.5 compared to 74.8 of the baseline model. This work presents a practical deployment approach for approximating and optimizing Vision Transformers for edge AI applications by balancing model metrics such as parameters, flops, latency, energy consumption, and accuracy, which can easily be adapted to other transformer models and datasets.
dc.description.wosFundingText	This work was supported by the European Union's Horizon 2020 Research and Innovation Programme, under the Marie Sk & lstrok;odowska Curie grant agreement No. 956090 (APROPOS), and from the Flemish Government under the "Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen" programme.
dc.identifier.doi	10.1007/s00607-025-01427-w
dc.identifier.issn	0010-485X
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/45236
dc.publisher	SPRINGER WIEN
dc.source.beginpage	71
dc.source.issue	3
dc.source.journal	COMPUTING
dc.source.numberofpages	25
dc.source.volume	107
dc.title	Approximating vision transformers for edge: variational inference and mixed-precision for multi-modal data
dc.type	Journal article
dspace.entity.type	Publication
Files	Original bundle Name: 8756.pdf Size: 2.62 MB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Articles

Approximating vision transformers for edge: variational inference and mixed-precision for multi-modal data

Date