Publication:

Chameleon: A Multimodal Learning Framework Robust to Missing Modalities

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0002-2969-3133
cris.virtualsource.department5f457973-5b9f-4593-8a29-1eeb47f32775
cris.virtualsource.orcid5f457973-5b9f-4593-8a29-1eeb47f32775
dc.contributor.authorLiaqat, Muhammad Irzam
dc.contributor.authorNawaz, Shah
dc.contributor.authorZaheer, Muhammad Zaigham
dc.contributor.authorSaeed, Muhammad Saad
dc.contributor.authorSajjad, Hassan
dc.contributor.authorDe Schepper, Tom
dc.contributor.authorNandakumar, Karthik
dc.contributor.authorKhan, Muhammad Haris
dc.contributor.authorGallo, Ignazio
dc.contributor.authorSchedl, Markus
dc.contributor.imecauthorDe Schepper, Tom
dc.contributor.orcidimecDe Schepper, Tom::0000-0002-2969-3133
dc.date.accessioned2025-06-06T04:50:09Z
dc.date.available2025-06-06T04:50:09Z
dc.date.issued2025
dc.description.abstractMultimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific components, making such approaches reliant on the availability of a complete set of modalities. In this work, we propose a robust multimodal learning framework, Chameleon, that adapts a common-space visual learning network to align all input modalities. To enable this, we present the unification of input modalities into one format by encoding any non-visual modality into visual representations thus making it robust to missing modalities. Extensive experiments are performed on multimodal classification task using four textual-visual (Hateful Memes, UPMC Food-101, MM-IMDb, and Ferramenta) and two audio-visual (avMNIST, VoxCeleb) datasets. Chameleon not only achieves superior performance when all modalities are present at train/test time but also demonstrates notable resilience in the case of missing modalities.
dc.description.wosFundingTextThis research was funded in whole or in part by the Austrian Science Fund (FWF): https://doi.org/10.55776/COE12, https://doi.org/10.55776/DFH23, https://doi.org/10.55776/P36413.
dc.identifier.doi10.1007/s13735-025-00370-y
dc.identifier.issn2192-6611
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/45761
dc.publisherSPRINGER
dc.source.beginpage21
dc.source.issue2
dc.source.journalINTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL
dc.source.numberofpages14
dc.source.volume14
dc.title

Chameleon: A Multimodal Learning Framework Robust to Missing Modalities

dc.typeJournal article
dspace.entity.typePublication
Files

Original bundle

Name:
s13735-025-00370-y.pdf
Size:
3.96 MB
Format:
Adobe Portable Document Format
Description:
Published
Publication available in collections: