Optimizing Violence Detection in Video Classification Accuracy via 3D Convolutional Neural Networks

Authors

DOI:

https://doi.org/10.47611/jsrhs.v14i1.8766

Keywords:

violence detection, 3d convolution, video classification, kernel interoperability

Abstract

As violent crimes continue to happen, it becomes necessary to have security cameras that can rapidly identify moments of violence with excellent accuracy. The purpose of this study is to identify how many frames should be analyzed at a time in order to optimize a violence detection model’s accuracy as a parameter of the depth of a 3D convolutional network.  Previous violence classification models have been created, but their application to live footage may be flawed. In this project, a convolutional neural network was created to analyze optical flow frames of each video. The number of frames analyzed at a time would vary with one, two, three, ten, and twenty frames, and each model would be trained for 20 epochs. The greatest validation accuracy was 94.87% and occurred with the model that analyzed three frames at a time. This means that machine learning models to detect violence may function better when analyzing three frames at a time for this dataset. The methodology used to identify the optimal number of frames to analyze at a time could be used in other applications of video classification, especially those of complex or abstract actions, such as violence.



Downloads

Download data is not yet available.

Author Biography

Simeon Sayer, Harvard University

Head Teaching Fellow in the Department of Computer Science and member of Staff at Havard's Faculty of Arts and Sciences.

References or Bibliography

Lin, L., & Purnell, N. (2019). A world with a billion cameras watching you is just around the corner. The Wall Street Journal.

Soax. (2024, October 30). How many hours of video are uploaded to YouTube every minute? SOAX. https://soax.com/research/how-many-hours-of-video-are-uploaded-to-youtube-every-minute

Deniz, O., Serrano, I., Bueno, G., & Kim, T.-K. (2014). Fast violence detection in video. 2014 International Conference on Computer Vision Theory and Applications (VISAPP), 2, 478–485.

Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., & Sukthankar, R. (2011). Violence detection in video using computer vision techniques. In P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, & W. Kropatsch (Eds.), Computer Analysis of Images and Patterns: CAIP 2011. Lecture Notes in Computer Science (Vol. 6855). Springer. https://doi.org/10.1007/978-3-642-23678-5_39

Parui, S. K., Biswas, S. K., Das, S., Chakraborty, M., & Purkayastha, B. (2023). An efficient violence detection system from video clips using ConvLSTM and keyframe extraction. 2023 11th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks (IEMECON), 1–5. https://doi.org/10.1109/IEMECON56962.2023.10092302

Singh, A., Patil, D., & Omkar, S. N. (2018). Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1629–1637).

Tucker, P. (2019, April 24). Here come AI-enabled cameras meant to sense crime before it occurs. Defense One. https://www.defenseone.com/technology/2019/04/ai-enabled-cameras-detect-crime-it-occurs-will-soon-invade-physical-world/156502/

Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, 121–130.

Published

02-28-2025

How to Cite

Kavathia, A., & Sayer, S. (2025). Optimizing Violence Detection in Video Classification Accuracy via 3D Convolutional Neural Networks. Journal of Student Research, 14(1). https://doi.org/10.47611/jsrhs.v14i1.8766

Issue

Section

HS Research Projects