Disentanglement of Latent Factors of Real and Fake Appearance for Deepfake Face Manipulation Detection

Authors

  • Suh-Yoon Hong Cheongshim International Academy
  • Dayul Park Cheongshim International Academy
  • GeunJung Yi Cheongshim International Academy

DOI:

https://doi.org/10.47611/jsrhs.v12i1.4076

Keywords:

Deepfake, representation learning, classification

Abstract

A deepfake video is a video in which generative models are used to alter the facial features to make the subject appear to be a different person. There are various ways to utilize such content, including those that are positive such as entertainment. However, it is also very easy to exploit deepfake videos for harmful use, including for spreading fake news or creating unwanted content. Thus there have been numerous attempts to detect whether a video has been manipulated using deepfake technology so as to prevent further harm. Previous approaches for achieving this purpose have attempted to detect discrepancies in the video frames through the use of techniques such as exploiting the temporal consistency between each frame with convolutional neural networks. Though this has produced adequate results, its accuracy is insufficient for real-world use. In this paper, we propose a novel method of using a convolutional neural network based autoencoder to detect whether a video is pristine or deepfake. Our method successfully disentangles latent factors of real and fake appearance to increase the classification accuracy while maintaining a relatively low time complexity, enhancing real-world applicability. Results from extensive experimentation show significant improvement from state-of-the-art-methods by upwards of 18.51%.

Downloads

Download data is not yet available.

References or Bibliography

Blitz, M. J. (2018). Lies, line drawing, and deep fake news. Okla. L. Rev., 71, 59.

Deepfakes: What are they, and why are they dangerous?[Website]. (2022, Otc 3). https://wyche.com/insights/blog/posts/deepfakes-what-are-they-and-why-are-they-dangerous

Deepfake video of Volodymyr Zelensky surrendering surfaces on social media[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=X17yrEV5sl4

Güera, D., & Delp, E. J. (2018, November). Deepfake video detection using recurrent neural networks. In 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 1-6). IEEE. https://doi.org/10.1109/AVSS.2018.8639163

de Lima, O., Franklin, S., Basu, S., Karwoski, B., & George, A. (2020). Deepfake detection using spatiotemporal convolutional networks. arXiv preprint arXiv:2006.14749. https://doi.org/10.48550/arXiv.2006.14749

Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2185-2194).

https://doi.org/10.1109/cvpr46437.2021.00222

Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3207-3216). https://doi.org/10.1109/cvpr42600.2020.00327

Deepfake modulated video[Website]. (2022 Oct 3). https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=55

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.1109/cvpr.2016.90

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/cvpr.2016.90

Chavda, A., Dsouza, J., Badgujar, S., & Damani, A. (2021, April). Multi-stage CNN architecture for face mask detection. In 2021 6th International Conference for Convergence in Technology (i2ct) (pp. 1-8). IEEE. https://doi.org/10.1109/i2ct51068.2021.9418207

Sun, Y., Wang, X., & Tang, X. (2013). Hybrid deep learning for face verification. In Proceedings of the IEEE international conference on computer vision (pp. 1489-1496). https://doi.org/10.1109/iccv.2013.188

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. https://doi.org/10.48550/arXiv.1312.6114

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851.

https://doi.org/10.48550/arXiv.2006.11239

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

https://doi.org/10.48550/arXiv.1406.2661

Badale, A., Castelino, L., Darekar, C., & Gomes, J. (2018). Deepfake detection using neural networks. In 15th IEEE international conference on advanced video and signal based surveillance (AVSS).

https://doi.org/10.7717/peerj-cs.881

Han, D., Yun, S., Heo, B., & Yoo, Y. (2021). Rethinking channel dimensions for efficient model design. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (pp. 732-741).

https://doi.org/10.48550/arXiv.2007.00992

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

https://doi.org/10.48550/arXiv.1608.06993

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

https://doi.org/10.48550/arXiv.1412.6980

Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2), 1. https://doi.org/10.5121/ijdkp.2015.5201

Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6299-6308). https://doi.org/10.1109/CVPR.2017.502.

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459). https://doi.org/10.1109/CVPR.2018.00675

You Won’t Believe What Obama Says In This Video![Website]. (2022 Oct 3).

https://www.youtube.com/watch?v=cQ54GDm1eL0

This is not Morgan Freeman - A Deepfake Singularity[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=oxXpB9pSETo

박영선인데 박영선 아니다…"영상을 믿지 마세요" (2019.11.13/뉴스데스크/MBC)[Website]. (2022 Oct 3).

https://www.youtube.com/watch?v=hqZhH9Qr4B0

Silvio Santos apresentando o Jornal Nacional[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=VDqTIThdj1s

Published

02-28-2023

How to Cite

Hong, S.-Y., Park, D., & Yi, G. (2023). Disentanglement of Latent Factors of Real and Fake Appearance for Deepfake Face Manipulation Detection. Journal of Student Research, 12(1). https://doi.org/10.47611/jsrhs.v12i1.4076

Issue

Section

HS Research Articles