TY - GEN
T1 - GAN or DM? In-depth Analysis and Evaluation of AI-generated Face Data for Generalizable Deepfake Detection
AU - Choi, Hyeongjun
AU - Woo, Simon S.
N1 - Publisher Copyright:
Copyright © 2025 held by the owner/author(s).
PY - 2025/5/14
Y1 - 2025/5/14
N2 - Deepfake detection remains challenging, particularly when identifying deepfakes generated by unseen forgery methods. Recent studies have shown that detectors trained on forgery data from Generative Adversarial Networks (GAN) cannot generalize well on data from Diffusion Models (DM) and vice versa. As generative methods such as GAN and DM are significantly advanced for creating highly photorealistic images, it becomes crucial to develop generalized methods to detect forgeries generated from different generation methods. While research on generalizable detectors is gaining momentum, the impact of training data on detectors' generalization ability has yet to be extensively studied, especially concerning synthetic human face images. In this work, we train popular deep neural networks using face data generated by various generative models and thoroughly analyze their generalizability. Our results reveal significant differences in model performance based on the forgery method used to generate the training data. Notably, we identify specific scenarios that significantly enhance model generalization, contradicting previous research finding that models trained on DM-generated data would achieve higher generalization performance than those trained on GAN-generated data. These findings emphasize the crucial role of training data selection in enhancing the generalization capabilities of deepfake detectors. By strategically selecting and combining datasets, we can develop more robust detection systems, laying a foundation for future research in creating reliable and universal deepfake detection methods.
AB - Deepfake detection remains challenging, particularly when identifying deepfakes generated by unseen forgery methods. Recent studies have shown that detectors trained on forgery data from Generative Adversarial Networks (GAN) cannot generalize well on data from Diffusion Models (DM) and vice versa. As generative methods such as GAN and DM are significantly advanced for creating highly photorealistic images, it becomes crucial to develop generalized methods to detect forgeries generated from different generation methods. While research on generalizable detectors is gaining momentum, the impact of training data on detectors' generalization ability has yet to be extensively studied, especially concerning synthetic human face images. In this work, we train popular deep neural networks using face data generated by various generative models and thoroughly analyze their generalizability. Our results reveal significant differences in model performance based on the forgery method used to generate the training data. Notably, we identify specific scenarios that significantly enhance model generalization, contradicting previous research finding that models trained on DM-generated data would achieve higher generalization performance than those trained on GAN-generated data. These findings emphasize the crucial role of training data selection in enhancing the generalization capabilities of deepfake detectors. By strategically selecting and combining datasets, we can develop more robust detection systems, laying a foundation for future research in creating reliable and universal deepfake detection methods.
KW - deepfake detection
KW - generalization
KW - synthetic face data
UR - https://www.scopus.com/pages/publications/105006444643
U2 - 10.1145/3672608.3707733
DO - 10.1145/3672608.3707733
M3 - Conference contribution
AN - SCOPUS:105006444643
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 759
EP - 766
BT - 40th Annual ACM Symposium on Applied Computing, SAC 2025
PB - Association for Computing Machinery
T2 - 40th Annual ACM Symposium on Applied Computing, SAC 2025
Y2 - 31 March 2025 through 4 April 2025
ER -