Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks

Peck, JonathanJonathanPeckGoossens, BartBartGoossensSaeys, YvanYvanSaeys2026-05-112026-05-112020978-3-030-65153-41865-0929https://imec-publications.be/handle/20.500.12860/59411Machine learning (ML) classifiers—in particular deep neural networks—are surprisingly vulnerable to so-called adversarial examples. These are small modifications of natural inputs which drastically alter the output of the model even though no relevant features appear to have been modified. One explanation that has been offered for this phenomenon is the calibration hypothesis, which states that the probabilistic predictions of typical ML models are miscalibrated. As a result, classifiers can often be very confident in completely erroneous predictions. Based on this idea, we propose the MultIVAP algorithm for defending arbitrary ML models against adversarial examples. Our method is inspired by the inductive Venn-ABERS predictor (IVAP) technique from the field of conformal prediction. The IVAP enjoys the theoretical guarantee that its predictions will be perfectly calibrated, thus addressing the problem of miscalibration. Experimental results on five image classification tasks demonstrate empirically that the MultIVAP has a reasonably small computational overhead and provides significantly higher adversarial robustness without sacrificing accuracy on clean data. This increase in robustness is observed both against defense-oblivious attacks as well as a defense-aware white-box attack specifically designed for the MultIVAP.engCalibrated Multi-probabilistic Prediction as a Defense Against Adversarial AttacksProceedings paper10.1007/978-3-030-65154-1_6WOS:001604138300006