GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Deressa, Deressa Wodajo; Mareen, Hannes; Lambert, Peter; Atnafu, Solomon; Akhtar, Zahid; Van Wallendael, Glenn

doi:10.3390/app15126622

Simple item page Full metadata Statistics

dc.contributor.author	Deressa, Deressa Wodajo
dc.contributor.author	Mareen, Hannes
dc.contributor.author	Lambert, Peter
dc.contributor.author	Atnafu, Solomon
dc.contributor.author	Akhtar, Zahid
dc.contributor.author	Van Wallendael, Glenn
dc.contributor.imecauthor	Deressa, Deressa Wodajo
dc.contributor.imecauthor	Mareen, Hannes
dc.contributor.imecauthor	Lambert, Peter
dc.contributor.imecauthor	Van Wallendael, Glenn
dc.contributor.orcidimec	Mareen, Hannes::0000-0002-0660-3190
dc.contributor.orcidimec	Lambert, Peter::0000-0001-5313-4158
dc.contributor.orcidimec	Van Wallendael, Glenn::0000-0001-9530-3466
dc.date.accessioned	2025-06-30T10:32:09Z
dc.date.available	2025-06-30T03:57:08Z
dc.date.available	2025-06-30T10:32:09Z
dc.date.issued	2025
dc.description.abstract	Deepfakes have raised significant concerns due to their potential to spread false information and compromise the integrity of digital media. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes an Autoencoder and Variational Autoencoder to learn from latent data distributions. By learning from the visual artifacts and latent data distribution, GenConViT achieves an improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v2) datasets. The proposed GenConViT model demonstrates strong performance in deepfake video detection, achieving high accuracy across the tested datasets. While our model shows promising results in deepfake video detection by leveraging visual and latent features, we demonstrate that further work is needed to improve its generalizability when encountering out-of-distribution data. Our model provides an effective solution for identifying a wide range of fake videos while preserving the integrity of media.
dc.description.wosFundingText	This research was funded by Addis Ababa University Research Grant for the Adaptive Problem-Solving Research. Reference number RD/PY-183/2021. Grant number AR/048/2021, and the Research Foundation-Flanders (FWO under project grant G0A2523N), the Flemish government (COM-PRESS project, within the relanceplan Vlaamse Veerkracht), IDLab (Ghent University-imec), Flanders Innovation and Entrepreneurship (VLAIO), and the European Union.
dc.identifier.doi	10.3390/app15126622
dc.identifier.issn	2076-3417
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/45865
dc.publisher	MDPI
dc.source.beginpage	1
dc.source.endpage	21
dc.source.issue	12
dc.source.journal	APPLIED SCIENCES-BASEL
dc.source.numberofpages	21
dc.source.volume	15
dc.title	GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
dc.type	Journal article
dspace.entity.type	Publication
Files	Original bundle Name: DS909.pdf Size: 5.91 MB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Articles

GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Date