Publication:

Approximating vision transformers for edge: variational inference and mixed-precision for multi-modal data

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0003-3792-5026
cris.virtualsource.department7db3840f-300f-4cd3-9f87-715eac1a46ae
cris.virtualsource.orcid7db3840f-300f-4cd3-9f87-715eac1a46ae
dc.contributor.authorKatare, Dewant
dc.contributor.authorLeroux, Sam
dc.contributor.authorJanssen, Marijn
dc.contributor.authorDing, Aaron Yi
dc.contributor.imecauthorLeroux, Sam
dc.contributor.orcidimecLeroux, Sam::0000-0003-3792-5026
dc.date.accessioned2025-02-20T08:42:28Z
dc.date.available2025-02-19T21:37:37Z
dc.date.available2025-02-20T08:42:28Z
dc.date.issued2025
dc.description.abstractVision transformer (ViTs) models have shown higher accuracy, robustness and large volume data processing ability, creating new baselines and references for perception tasks. However, these advantages require large memory and high-performance processors and computing units, which makes model adaptability and deployment challenging within resource-constrained environments such as memory-restricted and battery-powered edge devices. This paper addresses the model deployment challenges by proposing a model approximation approach VI-ViT, for edge deployment using variational inference with mixed precision for processing multi-modalities, such as point clouds and images. Our experimental evaluation on the nuScenes and Waymo datasets show up to 37% and 31% reduction in model parameters and Flops while maintaining a mean average precision of 70.5 compared to 74.8 of the baseline model. This work presents a practical deployment approach for approximating and optimizing Vision Transformers for edge AI applications by balancing model metrics such as parameters, flops, latency, energy consumption, and accuracy, which can easily be adapted to other transformer models and datasets.
dc.description.wosFundingTextThis work was supported by the European Union's Horizon 2020 Research and Innovation Programme, under the Marie Sk & lstrok;odowska Curie grant agreement No. 956090 (APROPOS), and from the Flemish Government under the "Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen" programme.
dc.identifier.doi10.1007/s00607-025-01427-w
dc.identifier.issn0010-485X
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/45236
dc.publisherSPRINGER WIEN
dc.source.beginpage71
dc.source.issue3
dc.source.journalCOMPUTING
dc.source.numberofpages25
dc.source.volume107
dc.title

Approximating vision transformers for edge: variational inference and mixed-precision for multi-modal data

dc.typeJournal article
dspace.entity.typePublication
Files

Original bundle

Name:
8756.pdf
Size:
2.62 MB
Format:
Adobe Portable Document Format
Description:
Published
Publication available in collections: