Optimizing Violence Detection in Video Classification Accuracy via 3D Convolutional Neural Networks
DOI:
https://doi.org/10.47611/jsrhs.v14i1.8766Keywords:
violence detection, 3d convolution, video classification, kernel interoperabilityAbstract
As violent crimes continue to happen, it becomes necessary to have security cameras that can rapidly identify moments of violence with excellent accuracy. The purpose of this study is to identify how many frames should be analyzed at a time in order to optimize a violence detection model’s accuracy as a parameter of the depth of a 3D convolutional network. Previous violence classification models have been created, but their application to live footage may be flawed. In this project, a convolutional neural network was created to analyze optical flow frames of each video. The number of frames analyzed at a time would vary with one, two, three, ten, and twenty frames, and each model would be trained for 20 epochs. The greatest validation accuracy was 94.87% and occurred with the model that analyzed three frames at a time. This means that machine learning models to detect violence may function better when analyzing three frames at a time for this dataset. The methodology used to identify the optimal number of frames to analyze at a time could be used in other applications of video classification, especially those of complex or abstract actions, such as violence.
Downloads
References or Bibliography
Lin, L., & Purnell, N. (2019). A world with a billion cameras watching you is just around the corner. The Wall Street Journal.
Soax. (2024, October 30). How many hours of video are uploaded to YouTube every minute? SOAX. https://soax.com/research/how-many-hours-of-video-are-uploaded-to-youtube-every-minute
Deniz, O., Serrano, I., Bueno, G., & Kim, T.-K. (2014). Fast violence detection in video. 2014 International Conference on Computer Vision Theory and Applications (VISAPP), 2, 478–485.
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., & Sukthankar, R. (2011). Violence detection in video using computer vision techniques. In P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, & W. Kropatsch (Eds.), Computer Analysis of Images and Patterns: CAIP 2011. Lecture Notes in Computer Science (Vol. 6855). Springer. https://doi.org/10.1007/978-3-642-23678-5_39
Parui, S. K., Biswas, S. K., Das, S., Chakraborty, M., & Purkayastha, B. (2023). An efficient violence detection system from video clips using ConvLSTM and keyframe extraction. 2023 11th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks (IEMECON), 1–5. https://doi.org/10.1109/IEMECON56962.2023.10092302
Singh, A., Patil, D., & Omkar, S. N. (2018). Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1629–1637).
Tucker, P. (2019, April 24). Here come AI-enabled cameras meant to sense crime before it occurs. Defense One. https://www.defenseone.com/technology/2019/04/ai-enabled-cameras-detect-crime-it-occurs-will-soon-invade-physical-world/156502/
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, 121–130.
Published
How to Cite
Issue
Section
Copyright (c) 2025 Aarjav Kavathia; Simeon Sayer

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.


