| 研究生: |
汪新 Wasim Ahmad |
|---|---|
| 論文名稱: |
深偽視頻/圖像的模型歸因 Model Attribution for Deepfake Videos/Images |
| 指導教授: |
彭彥璁
Yan-Tsung Peng 張原豪 Yuan-Hao Chang |
| 口試委員: |
黃瀚萱
Huang, Hen-Hsen 阮聖彰 Shanq-Jang Ruan 陳祝嵩 Chen, Chu-Song |
| 學位類別: |
博士
Doctor |
| 系所名稱: |
資訊學院 - 社群網路與人智計算國際研究生博士學位學程(TIGP) Taiwan International Graduate Program(TIGP) |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 深偽技術 、深偽模型歸屬(DFMA) 、膠囊網路 、動態路由演算法(DRA) 、時空注意力機制(STA) 、注意力機制 、時間序列分析 、人臉置換深偽 、視訊鑑識 、多媒體鑑識 、資訊安全 、生成對抗網路(GANs) |
| 外文關鍵詞: | Deepfake, Deepfake Model Attribution (DFMA), Capsule Networks, Dynamic Routing Algorithm (DRA), Spatial-Temporal Attention (STA), Attention Mechanism, Temporal Analysis, Face-swap Deepfakes, Video Forensics, Multimedia Forensics, Information Security, Generative Adversarial Networks (GANs) |
| 相關次數: | 點閱:70 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著利用先進AI 換臉技術的Deepfake 影片數量迅速增加,建立能追溯影片生成來源的強健鑑識技術成為當前亟需解決的議題。不同於目前相關研究多聚焦於真假分類,模型歸屬(Model Attribution),亦即識別生成Deepfake 內容所使用之特定生成模型或工具,則提供了更細緻且具實際應用價值的途徑。透過揭示模型特有的生成痕跡,模型歸屬技術有助於溯源追查,同時支持針對性防禦策略的制定,以因應不斷演進的威脅。
本論文針對Deepfake 鑑識中的模型歸屬問題,將其建構為一個多分類任務。首先,提出Capsule-Spatial-Temporal(CapST)模型,此模型兼具輕量化與高效能特性。CapST 採用修改版VGG19 作為特徵提取骨幹,能有效萃取深層影像特徵;同時,結合膠囊網路(Capsule Networks)以捕捉複雜且層次化的特徵結構,並搭配時空注意力機制,進一步整合各影格間之時序資訊,進而達到穩健的模型歸屬效果。透過此設計,CapST 能顯著提升模型歸屬準確率,並於DFDM 與GANGen-Detection 資料集上具有優異表現,同時有效控制計算資源消耗,兼顧準確性與效能。
在此基礎上,本研究進一步提出FAME(Fake Attribution via Multi-level Embeddings)框架,旨在提升模型於多樣化資料與複雜影片條件下之泛化能力。相較於主要著重階層膠囊特徵之CapST,FAME 引入創新之多層次時空注意力策略,能敏銳偵測不同編碼解碼流程與壓縮設定下的細微生成痕跡。FAME 亦結合卷積神經網路(CNN)與長短期記憶網路(LSTM)生成混合空間-時間特徵嵌入,不僅提升歸屬準確率,亦有效降低模型參數量。
本研究於DFDM、FaceForensics++、FakeAVCeleb 及GANGen-Detection 等多個標竿資料集上進行全面性實驗評估。實驗結果顯示,CapST 能以低運算負擔達成高歸屬準確率,而FAME 更進一步於泛化能力、精確度及執行效率方面展現卓越表現。綜合而言,本論文提出之方法能有效應對生成式媒體技術所帶來的快速威脅演進,並提供一套具擴展性、精確且高效能的Deepfake 模型歸屬解決方案。
The rapid proliferation of Deepfake videos, enabled by sophisticated AI-driven face swapping techniques, has intensified the demand for robust forensic tools capable of identifying the generative sources behind these manipulations. While binary real/fake classification has been the primary focus in prior research, model attribution the task of determining the specific generative model or tool used to create a Deepfake—offers a more nuanced and actionable approach. By revealing model-specific artifacts, attribution facilitates source tracing and supports the development of tailored countermeasures against evolving threats.
This dissertation addresses the model attribution problem in Deepfake forensics by casting it as a multiclass classification challenge. We first introduce the Capsule-Spatial-Temporal (CapST) model, a lightweight and effective framework, which leverages a truncated VGG19 for efficient feature extraction, Capsule Networks for hierarchical feature modeling, and a spatio-temporal attention mechanism to aggregate frame-level features into a robust video-level representation. The model demonstrates strong attribution performance on the DFDM and GANGen-Detection datasets while maintaining a compact computational footprint.
Building on the strengths and limitations of CapST, we propose a second, more generalizable framework, FAME (Fake Attribution via Multi-level Embeddings). While CapST proved effective within controlled datasets, it was less optimized for attribution across diverse and challenging video conditions. FAME addresses this gap by introducing a novel multi-level spatio-temporal attention strategy, designed to detect subtle generative traces across different encoder-decoder pipelines and compression settings. Unlike CapST, which primarily emphasizes hierarchical capsule features, FAME incorporates hybrid spatial-temporal embeddings using CNNs and LSTMs, providing improved attribution accuracy with even fewer parameters.
Both models have been extensively evaluated on benchmark datasets, including DFDM, FaceForensics ++, FakeAVCeleb, and GANGen detection. CapST showcases high attribution accuracy with low computational cost, while FAME further advances generalization, accuracy, and runtime efficiency across varied scenarios. Together, these contributions offer a comprehensive solution to Deepfake model attribution, paving the way for scalable and effective forensic applications that can adapt to the fast-evolving landscape of generative media technologies.
Acknowledgments i
中文摘要 ii
Abstract iii
1 Introduction 1
1.1 Motivation . . 1
1.2 Problem Statement . . 2
1.3 Objectives . . 2
1.4 Contributions . . 2
1.4.1 CapST (Capsule-Spatial-Temporal Framework) . . 2
1.4.2 ADNN (Attention-Driven Neural Network) . . 3
1.4.3 Collective Impact . . 3
1.5 Structure of Dissertation . . 3
2 Background and Related Work 5
2.1 Deepfake Technology . . 5
2.2 Deepfake Datasets . . 8
2.3 Evaluation Metrics and Loss Functions . . 8
2.3.1 Metrics . .8
2.3.2 Loss Functions . .10
2.4 Deepfake Generation (Models/Architectures) . . 10
2.4.1 Variational Autoencoders (VAEs) . . 10
2.4.2 Generative Adversarial Networks (GANs) . . 12
2.4.3 Diffusion Models . . 13
2.4.4 Comparison of Deepfake Techniques and Models . . 15
2.5 Deepfake Detection . . 15
2.5.1 Spatial Domain . . 15
2.5.2 Temporal Domain . . 17
2.5.3 Frequency Domain . . 18
2.5.4 Data-Driven Forgery Detection . . 18
2.5.5 Detection Using CNN + LSTM . . 19
2.5.6 Detection Using CNN + Vision Transformer . . 21
2.6 Deepfake Model Attribution: Identifying the Source of Synthetic Media . . 22
2.6.1 GAN-Based Model Attribution . . 23
2.6.2 Limitations of GANs in DeepFake Video Generation . .23
2.6.3 DFAE Model Attribution: Challenges and Research Gaps . . . .24
2.6.4 Advancing Model Attribution for DeepFakes . . 24
2.7 Summary . . 25
3 Methodologies for Deepfake Model Attribution 26
3.1 CapST: Leveraging Capsule Networks and Temporal Attention for Accu-
rate Model Attribution in Deepfake Videos . . 26
3.1.1 Introduction to CapST Deepfake Model Attribution . .26
3.1.2 Overview of CapST Architecture . . 26
3.1.3 Frame-Level Feature Extraction . . 27
3.1.4 Capsule Network for Spatial Hierarchy Learning . . 28
3.1.5 Child Capsule . . 28
3.1.6 Parent Capsule . . 29
3.1.7 Capsule Network for Spatial Hierarchy Learning . . 30
3.1.8 CapST Classification Layer . . 30
3.1.9 Model Training and Loss Function . . 31
3.1.10 Summary . . 31
3.2 FAME: Fine-Grained Attribution via Multi-level Attention for Deepfake
Model Attribution . . 32
3.2.1 Fine-Grained Attribution via Multi-level Attention (FAME) Ar-
chitecture . . 32
3.2.2 Loss Function and Training Strategy . . 34
3.2.3 Summary . .34
4 EXPERIMENTS AND RESULTS 35
4.1 Approach 1→ CapST . . 35
4.1.1 Datasets . . 35
4.1.2 Experimental Setting . . 36
4.1.3 Results and Evaluation . . 36
4.1.4 Comparison with existing methods on DFDM Dataset . .37
4.1.5 Comparison with existing methods on GANGen-Detection Dataset . . 37
4.1.6 Analysis of Grad-CAM Output Across Different DeepFake
Techniques and Model Configurations . . 39
4.1.7 Limitations and Observations: . . 41
4.1.8 Ablation Study . . 41
4.1.9 Comparing Our Results to DMA-STA Reproduced Results . . . 42
4.1.10 Comparison with DMA-STA in terms of Computational Complexity . . 42
4.1.11 Comparison using different backbones . . 43
4.1.12 Comparison using different VGG Layers and Number of Frames . . 44
4.1.13 Summary . . 45
4.2 Approach 2→ FAME . . 45
4.2.1 Overview . . 45
4.2.2 Datasets . . 45
4.2.3 Implementation Environment . . 46
4.2.4 Experimental Setup . . 46
4.2.5 Comparison with Existing Methods on DFDM Dataset . .47
4.2.6 Comparison with DMA-STA Existing and Reproduced Results . . 49
4.2.7 Comparison with Existing Methods on FF++ Dataset . .50
4.2.8 Comparison with Existing Methods on FAVCeleb Dataset . . . . 51
4.2.9 Evaluation Metrics . . 52
4.2.10 Summary of FAME . .53
5 Conclusion 55
5.1 Summary of Our Findings . . 55
5.2 Key Contributions . . 55
5.2.1 Development of Novel Model Attribution Frameworks . . . . . 56
5.2.2 Comprehensive Evaluation Across Multiple Datasets . . . . . 56
5.2.3 Optimized and Resource-Efficient Solutions for Model Attribution . . 56
5.2.4 Forensic Implications and Real-World Applicability . . . . . 56
5.3 Limitations . . 56
5.4 Future Directions . . 56
5.4.1 Expanding to Other Manipulation Techniques . . 57
5.4.2 Improving Robustness Against Adversarial Attacks . .57
5.4.3 Real-Time Attribution in Low-Resource Environments . . . . . 57
5.4.4 Integration with Legal and Policy Frameworks . . 57
5.5 Final Remarks . . 57
Bibliography . . 58
Appendix Publications . . 68
[1] Darius Afchar et al. “MesoNet: A Compact Facial Video Forgery Detection Network”. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2018, pp. 1–7.
[2] Triantafyllos Afouras, Joon Son Chung, and Andrew Zisserman. “LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition”. In: arXiv preprint arXiv:1809.00496 (2018), p. 5.
[3] Triantafyllos Afouras et al. “Deep Audio-Visual Speech Recognition”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2018), pp. 5, 11.
[4] Shruti Agarwal and Hany Farid. “Detecting Deep-Fake Videos from Aural and Oral Dynamics”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, p. 14.
[5] Wasim Ahmad et al. “CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos”. In: ACM Transactions on Multimedia Computing, Communications and Applications (TOMM) 1.1 (Jan. 2025), pp. 1–23. DOI: 10.1145/3715138. URL: https://doi.org/10.1145/3715138.
[6] Ahmad et al. “ADNN: An Efficient Model Attribution of Face-Swap Deepfake Videos Using Attention-Driven Neural Network”. In: Under Review (2025).
[7] Vishal Asnani et al. “Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[8] Zheng Ba et al. “Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection”. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2024, pp. 13, 14.
[9] Tadas Baltrusaitis et al. “OpenFace 2.0: Facial Behavior Analysis Toolkit”. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018, pp. 59–66.
[10] Mikołaj Bińkowski et al. “Demystifying MMD GANs”. In: Proceedings of the International Conference on Learning Representations (ICLR). 2018, pp. 5, 16.
[11] Dmitri Bitouk et al. “Face Swapping: Automatically Replacing Faces in Photographs”. In: ACM Transactions on Graphics (TOG), Proceedings of SIGGRAPH. 2008, p. 7.
[12] Andreas Blattmann et al. “Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets”. In: arXiv preprint arXiv:2311.15127 (2023), pp. 1, 2, 4, 9.
[13] Sofia Bounareli et al. “HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023, pp. 7149–7159.
[14] Jingjing Cao et al. “End-to-End Reconstruction-Classification Learning for Face Forgery Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022, pp. 13, 14, 18.
[15] Qiong Cao et al. “VGGFace2: A Dataset for Recognizing Faces across Pose and Age”. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG). 2018, p. 5.
[16] Lucy Chai et al. “What Makes Fake Images Detectable? Understanding Properties that Generalize”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2020, p. 13.
[17] Linsen Chen et al. “Lip Movements Generation at a Glance”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018, pp. 5, 10, 11, 16.
[18] Xiaowei Chen et al. “Image Manipulation Detection by Multi-View Multi-Scale Supervision”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, p. 5.
[19] Robert Chesney and Danielle Citron. “Deepfakes and the New Disinformation War: The Coming Age of Post-Truth Geopolitics”. In: Foreign Affairs (2019).
[20] Jinhyuk Choi et al. “Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024, pp. 13, 14.
[21] François Chollet. “Xception: Deep Learning with Depthwise Separable Convolutions”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017.
[22] Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. “VoxCeleb2: Deep Speaker Recognition”. In: Proceedings of Interspeech. 2018, p. 5.
[23] Davide Cozzolino et al. “Audio-Visual Person-of-Interest Deepfake Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, p. 13.
[24] Kevin Dale et al. “Video Face Replacement”. In: ACM Transactions on Graphics (TOG), Proceedings of SIGGRAPH. 2011, p. 7.
[25] Oscar De Lima et al. “Deepfake Detection Using Spatiotemporal Convolutional Networks”. In: arXiv Preprint (2020). eprint: arXiv:2006.14749.
[26] DFD. Retrieved from https://blog.research.google/2019/09/contributingdata-to-deepfake-detection.html. 2019.
[27] Brian Dolhansky et al. “The DeepFake Detection Challenge (DFDC) Dataset”. In: arXiv preprint arXiv:2006.07397 (2020), pp. 5, 18.
[28] Brian Dolhansky et al. “The DeepFake Detection Challenge (DFDC) Preview Dataset”. In: arXiv preprint arXiv:1910.08854 (2019), p. 5.
[29] Brian Dolhansky et al. “The Deepfake Detection Challenge Dataset”. In: arXiv preprint arXiv:2006.07397 (2020).
[30] Ricard Durall et al. “Unmasking DeepFakes with Simple Features”. In: arXiv Preprint (2019). eprint: arXiv:1911.00686.
[31] Cheng Feng, Zhen Chen, and Andrew Owens. “Self-Supervised Video Forensics by Audio-Visual Anomaly Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, p. 13.
[32] Jinyu Feng and Prashant Singhal. “3D Face Style Transfer with a Hybrid Solution of NeRF and Mesh Rasterization”. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 2024, p. 6.
[33] Joshua Frank et al. “Leveraging Frequency Analysis for Deep Fake Image Recognition”. In: Proceedings of the International Conference on Machine Learning (ICML). 2020, p. 14.
[34] Shreyas Girish et al. “Towards Discovery and Attribution of Open-World GAN Generated Images”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 14094–14103.
[35] Ian Goodfellow et al. “Generative Adversarial Nets”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2014, pp. 2, 4, 5.
[36] Zhizhou Gu et al. “Delving into the Local: Dynamic Inconsistency Learning for Deepfake Video Detection”. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2022, pp. 13, 14.
[37] Xiaoming Guo et al. “Hierarchical Fine-Grained Image Forgery Detection and Localization”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 14, 15.
[38] Yixuan Guo, Cheng Zhen, and Peng Yan. “Controllable Guide-Space for Generalizable Face Forgery Detection”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023, pp. 15, 18.
[39] Zhen Guo et al. “Constructing New Backbone Networks via Space-Frequency Interactive Convolution for Deepfake Detection”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2023), p. 14.
[40] Anirudh Gupta et al. “Photorealistic Video Generation with Diffusion Models”. In: arXiv preprint arXiv:2312.06662 (2023), p. 4.
[41] Alexandros Haliassos et al. “Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022, pp. 2, 13, 18.
[42] Kaiming He et al. “Deep Residual Learning for Image Recognition”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016, pp. 770–778.
[43] Martin Heusel et al. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2017, pp. 5, 16.
[44] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2020, pp. 2, 4, 5.
[45] Andrew G. Howard et al. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”. In: arXiv Preprint (2017). eprint: arXiv:1704.04861.
[46] Chien-Chang Hsu, Chih-Yuan Lee, and Yu-Xiang Zhuang. “Learning to Detect Fake Face Images in the Wild”. In: Proceedings of the International Symposium on Computer, Consumer and Control (IS3C). 2018, p. 15.
[47] Jiale Hu et al. “Finfer: Frame Inference-Based Deepfake Detection for High-Visual-Quality Videos”. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2022, pp. 14, 15.
[48] Bin Huang et al. “Implicit Identity Driven Deepfake Face Swapping Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 5, 14, 15.
[49] Gary B. Huang et al. “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”. In: Workshop on Faces in ‘Real- Life’Images: Detection, Alignment, and Recognition. 2008, p. 5.
[50] Phillip Isola et al. “Image-to-Image Translation with Conditional Adversarial Networks”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2017, p. 4.
[51] S. Jia, X. Li, and S. Lyu. “Model Attribution of Face-Swap Deepfake Videos”. In: IEEE International Conference on Image Processing (ICIP). 2022, pp. 2356–2360. DOI: 10.1109/ICIP46576.2022.9897972.
[52] Shuming Jia, Xin Li, and Siwei Lyu. “Model Attribution of Face-Swap Deepfake Videos”. In: 2022 IEEE International Conference on Image Processing (ICIP). 2022, pp. 2356–2360. DOI: 10.1109/ICIP46576.2022.9897972.
[53] Lipeng Jiang et al. “DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 5, 18.
[54] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2016, p. 5.
[55] Tero Karras, Samuli Laine, and Timo Aila. “A Style-Based Generator Architecture for Generative Adversarial Networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 1, 2, 4, 5,12.
[56] Tero Karras et al. “Analyzing and Improving the Image Quality of StyleGAN”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 1, 4.
[57] Tero Karras et al. “Progressive Growing of GANs for Improved Quality, Stability, and Variation”. In: Proceedings of the International Conference on Learning Representations (ICLR). 2018, p. 5.
[58] Hafeez Khalid et al. “FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2021, pp. 5, 14.
[59] Diederik P. Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2014), pp. 1, 4, 5.
[60] Pavel Korshunov and Sébastien Marcel. “Deepfakes: A New Threat to Face Recognition? Assessment and Detection”. In: arXiv preprint arXiv:1812.08685 (2018).
[61] Pavel Korshunov and Sébastien Marcel. “Deepfakes: A New Threat to Face Recognition? Assessment and Detection”. In: arXiv preprint arXiv:1812.08685 (2018), pp. 5, 14.
[62] Pil Kwon et al. “KoDF: A Large-Scale Korean Deepfake Detection Dataset”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, p. 5.
[63] Bao Minh Le and Sung-Soo Woo. “Quality-Agnostic Deepfake Detection with Intra-Model Collaborative Learning”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023, p. 5.
[64] Jinchao Li et al. “Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, pp. 2, 14, 18.
[65] Yuezun Li, Ming-Ching Chang, and Siwei Lyu. “In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking”. In: IEEE International Workshop on Information Forensics and Security (WIFS). 2018.
[66] Yuezun Li, Ming-Ching Chang, and Siwei Lyu. “In Ictu Oculi: Exposing AIGenerated Fake Face Videos by Detecting Eye Blinking”. In: Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS). 2018, pp. 5, 13.
[67] Yuezun Li and Siwei Lyu. “Exposing DeepFake Videos by Detecting Face Warping Artifacts”. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2019.
[68] Yuezun Li et al. “Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 5, 18.
[69] Yu Lin et al. “Face Swapping Under Large Pose Variations: A 3D Model-Based Approach”. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 2012, p. 7.
[70] Jian Liu et al. “Residual Denoising Diffusion Models”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024, pp. 2773–2783.
[71] Zhen Liu, Xin Qi, and Philip H. Torr. “Global Texture Enhancement for Fake Face Detection in the Wild”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 13, 14.
[72] Ziwei Liu et al. “Deep Learning Face Attributes in the Wild”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2015, p. 5.
[73] Yiqun Ma et al. “Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos”. In: arXiv preprint arXiv:2304.01186 (2023), pp. 4, 6, 15.
[74] Francesco Marra et al. “Do GANs Leave Artificial Fingerprints?” In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 2019, pp. 506–511.
[75] Francesco Marra et al. “Do GANs Leave Artificial Fingerprints?” In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2019, pp. 506–511.
[76] Iacopo Masi et al. “Two-Branch Recurrent Network for Isolating Deepfakes in Videos”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2020, pp. 14, 18.
[77] Debin Meng et al. “Frame Attention Networks for Facial Expression Recognition in Videos”. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 3866–3870. DOI: 10.1109/ICIP.2019.8803603.
[78] Debin Meng et al. “Frame Attention Networks for Facial Expression Recognition in Videos”. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 3866–3870. DOI: 10.1109/ICIP.2019.8803603.
[79] Cheng Miao et al. “Hierarchical Frequency-Assisted Interactive Networks for Face Manipulation Detection”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2022), p. 14.
[80] Yisroel Mirsky and Wenke Lee. “The Creation and Detection of Deepfakes: A Survey”. In: ACM Computing Surveys (2021).
[81] Mehdi Mirza and Simon Osindero. “Conditional Generative Adversarial Nets”. In: arXiv preprint arXiv:1411.1784 (2014), p. 4.
[82] Simon Moore and Richard Bowden. “Multi-view Pose and Facial Expression Recognition”. In: Proceedings of the British Machine Vision Conference (BMVC). 2010, pp. 5, 9.
[83] Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. “VoxCeleb: A Large Scale Speaker Identification Dataset”. In: Proceedings of Interspeech. 2017, pp. 5, 17.
[84] Kshitij Narayan et al. “DeepHy: On DeepFake Phylogeny”. In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB). 2022, p. 5.
[85] Kshitij Narayan et al. “DF-Platter: Multi-Face Heterogeneous Deepfake Dataset”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, p. 5.
[86] Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. “Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos”. In: ICASSP 2019 -2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 2307–2311. DOI: 10.1109/ICASSP.2019.8682602.
[87] Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. “CapsuleForensics: Using Capsule Networks to Detect Forged Images and Videos”. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2019, p. 13.
[88] Yuval Nirkin et al. “Deepfake Detection Based on Discrepancies Between Faces and Their Context”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021), pp. 13, 14.
[89] Yuval Nirkin et al. “On Face Segmentation, Face Swapping, and Face Perception”. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG). 2018, p. 7.
[90] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. “Deep Face Recognition”. In: Proceedings of the British Machine Vision Conference (BMVC). 2015, p. 5.
[91] Chen Peng et al. “Where Deepfakes Gaze At? Spatial-Temporal Gaze Inconsistency Analysis for Video Face Forgery Detection”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2024), pp. 13, 14.
[92] Xin Peng et al. “PortraitBooth: A Versatile Portrait Model for Fast Identity-Preserved Personalization”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024, p. 6.
[93] K. R. Prajwal et al. “A Lip Sync Expert is All You Need for Speech-to-Lip Generation in the Wild”. In: Proceedings of the ACM International Conference on Multimedia (ACM MM). 2020, pp. 5, 10, 11, 16.
[94] Yuezun Qian et al. “Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2020, pp. 2, 14, 18.
[95] Robin Rombach et al. “High-Resolution Image Synthesis with Latent Diffusion Models”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022, pp. 2, 4.
[96] Andreas Rossler et al. “FaceForensics++: Learning to Detect Manipulated Facial Images”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2019.
[97] Andreas Rossler et al. “FaceForensics++: Learning to Detect Manipulated Facial Images”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2019, pp. 1–11.
[98] Andreas Rössler et al. “FaceForensics++: Learning to Detect Manipulated Facial Images”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019, pp. 5, 17, 18.
[99] Nataniel Ruiz, Eunji Chong, and James M. Rehg. “Fine-Grained Head Pose Estimation Without Keypoints”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 5, 16.
[100] Mark Sandler et al. “MobileNetV2: Inverted Residuals and Linear Bottlenecks”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018, pp. 4510–4520.
[101] Shanface33. Deepfake Model Attribution Issue 2. https://github.com/shanface33/Deepfake _ Model _ Attribution / issues / 2. Accessed: February 25, 2025.2025.
[102] Kota Shiohara and Toshihiko Yamasaki. “Detecting Deepfakes with Self-Blended Images”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022, pp. 13, 14, 18.
[103] Jascha Sohl-Dickstein et al. “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics”. In: Proceedings of the International Conference on Machine Learning (ICML). 2015, p. 4.
[104] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. “Learning Structured Output Representation Using Deep Conditional Generative Models”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2015, pp. 1, 4.
[105] Jian Sun et al. “Multicaption Text-to-Face Synthesis: Dataset and Algorithm”. In: Proceedings of the ACM International Conference on Multimedia (ACM MM). 2021, p. 5.
[106] Christian Szegedy et al. “Going Deeper with Convolutions”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2015, pp. 1–9.
[107] Cheng Tan et al. “Frequency-Aware Deepfake Detection: Improving Generalizability Through Frequency Space Domain Learning”. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2024, p. 14.
[108] Cheng Tan et al. “Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 5, 13, 14.
[109] Cheng Tan et al. “Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 2, 5, 13, 14.
[110] Chuang Tan, Jiahui Li, and Feng Wu. AIGC Dataset: A Dataset for Detection of AI-Generated Content (GANGen). https://github.com/chuangchuangtan/GANGen-Detection. GitHub Repository. 2025.
[111] Mingxing Tan and Quoc Le. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”. In: International Conference on Machine Learning. PMLR, 2019, pp. 6105–6114.
[112] Justus Thies et al. “Face2Face: Real-time Face Capture and Reenactment of RGB Videos”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
[113] Aaron Van Den Oord and Oriol Vinyals. “Neural Discrete Representation Learning”. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2017, pp. 1, 4.
[114] Hao Wang et al. “CosFace: Large Margin Cosine Loss for Deep Face Recognition”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 5, 16.
[115] J. Wang et al. “Memory-Augmented Contrastive Learning for Talking Head Generation”. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2023.
[116] Jian Wang et al. “M2TR: Multi-Modal Multi-Scale Transformers for Deepfake Detection”. In: Proceedings of the International Conference on Machine Learning (ICML). 2022, pp. 5, 14, 18.
[117] Ke Wang et al. “MEAD: A Large-Scale Audiovisual Dataset for Emotional Talking-Face Generation”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2020, pp. 5, 17.
[118] Rui Wang et al. “FakeSpotter: A Simple Yet Robust Baseline for Spotting AISynthesized Fake Faces”. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2021, p. 15.
[119] Tian Wang and Kam-Pui Chow. “Noise-Based Deepfake Detection via Multi-Head Relative-Interaction”. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2023, pp. 13, 14.
[120] Ting-Chun Wang, Arash Mallya, and Ming-Yu Liu. “One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, p. 5.
[121] Zhou Wang et al. “Image Quality Assessment: From Error Visibility to Structural Similarity”. In: IEEE Transactions on Image Processing (TIP) (2004), pp. 5, 16.
[122] Sanghyun Woo et al. “CBAM: Convolutional Block Attention Module”. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2018, pp. 3–19.
[123] Mingjie Wu et al. “Traceevader: Making Deepfakes More Untraceable via Evading the Forgery Model Attribution”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. 18. 2024, pp. 4365–4373.
[124] Wei Xia et al. “TediGAN: Text-Guided Diverse Face Image Generation and Manipulation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, p. 5.
[125] Yifan Xu et al. “Towards Generalizable Deepfake Video Detection with Thumbnail Layout and Graph Reasoning”. In: International Journal of Computer Vision (IJCV) (2024), pp. 13, 14.
[126] Tianchen Yang et al. “Deepfake Network Architecture Attribution”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. 4. 2022, pp. 4662–4670.
[127] Wei Yang et al. “Avoid-DF: Audio-Visual Joint Learning for Detecting Deepfake”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2023), pp. 2, 13, 14, 18.
[128] Xin Yang, Yuezun Li, and Siwei Lyu. “Exposing Deep Fakes Using Inconsistent Head Poses”. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2019, pp. 5, 13, 14.
[129] Zhiwei Yang et al. “Masked Relation Learning for Deepfake Detection”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2023), pp. 2, 13, 14, 18.
[130] Qian Yin et al. “Dynamic Difference Learning with Spatio-Temporal Correlation for Deepfake Video Detection”. In: IEEE Transactions on Information Forensics and Security (TIFS) (2023), pp. 2, 13, 18.
[131] Linqi Yu, Hao Xie, and Yujing Zhang. “Multimodal Learning for Temporally Coherent Talking Face Generation with Articulator Synergy”. In: IEEE Transactions on Multimedia (TMM) (2021), p. 11.
[132] Ning Yu, Larry S. Davis, and Mario Fritz. “Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019, pp. 7556–7566.
[133] Richard Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 5, 16.
[134] Huan Zhao et al. “Multi-Attentional Deepfake Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, pp. 2185–2194.
[135] Yujun Zheng et al. “Exploring Temporal Coherence for More General Video Face Forgery Detection”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 13, 14, 18.
[136] Peng Zhou et al. “Two-Stream Neural Networks for Tampered Face Detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2017, pp. 2, 13.
[137] Hongyu Zhu et al. “CelebV-HQ: A Large-Scale Video Facial Attributes Dataset”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2022, pp. 5, 11.
[138] Bo Zi et al. “WildDeepfake: A Challenging Real-World Dataset for DeepFake Detection”. In: Proceedings of the ACM International Conference on Multimedia (ACM MM). 2020, pp. 5, 14.
全文公開日期 2026/06/05