De Geest JelleDe Smet, PatrickPatrickDe SmetBonetto, LucioLucioBonettoLambert, PeterPeterLambertVan Wallendael, GlennGlennVan WallendaelMareen, HannesHannesMareen2026-04-152026-04-1520262162-2248https://imec-publications.be/handle/20.500.12860/59099The widespread sharing of images has led to challenges in controlling the spread of harmful content in consumer devices, particularly child sexual abuse material. Perceptual hashing offers a solution by enabling the fast detection of blacklisted images through compact representations of visual content. However, automatically detecting (near-)duplicate images in overwhelming volumes of data is challenging due to the limitations of traditional perceptual hashing methods. For example, existing methods can fail to detect images with minor modifications, specifically spatial modifications. In addition, they were often designed to find images derived from the same original image, and hence are incapable of recognizing visually similar images that originate from a different acquisition origin. This study explores the use of Vision Transformers (ViTs), specifically the contrastive languageāimage pretraining model, to enhance perceptual hashing, better aligning with human perception. The proposed ViTHash method is compared against traditional perceptual hashing methods, such as pHash, dHash, and PDQHash. Quantitative results show that ViTHash outperforms traditional methods in handling spatial distortions, such as rotation and mirroring, although it is less robust to visual quality distortions, such as blurring and compression. Qualitative analysis reveals that ViTHash aligns more closely with human perception and is capable of identifying visually similar images, even when they are images depicting visually similar content yet originate from different acquisition origins. These findings demonstrate that ViTHash offers significant potential for applications requiring nuanced image similarity assessments, providing a valuable tool to enhance the detection of illicit content in consumer electronics devices and support law enforcement efforts.engExploring Human Perception-Aligned Perceptual HashingJournal article10.1109/mce.2025.3551813WOS:001630883600001