Automatically Labeling Offensive Formations in American Football Film Using Deep Learning

Authors

  • Kyle Zhou Sunset High School
  • Jason Galbraith Sunset High School

DOI:

https://doi.org/10.47611/jsrhs.v12i1.4278

Keywords:

American Football, Sports Analytics, Computer Vision, Offensive Football Formations, Image Classification, Deep Learning, Convolutional Neural Network, Transformer

Abstract

Web services for storing, annotating, and sharing sports videos by high-school athletic teams have become more prevalent in recent years. However, most services lack the ability for coaches to automatically tag film, leading to many hours of manual annotation. For American football videos, coaches need to label formations, plays, and field positions in order to extract insights and create strategic game plans. This paper presents an end-to-end machine learning pipeline for automatically labeling American football offensive formations in videos. The pipeline includes pre-processing of videos, image classification, and a novel inference approach. The study compares a custom CNN model with pre-trained image classifier models using transfer learning. Specifically, CNN-based architectures (MobileNet, Inception, EfficientNet, etc.) and a transformer-based Vision Transformer (ViT) are compared. All models are trained on ~1400 images with the three most popular formation labels extracted from video clips of high-school football team games. The results show that several models, including the custom CNN model, achieved greater than 90% classification accuracy on the test dataset. The inference is performed by sampling multiple frames from a video clip, passing them through a trained image classifier, and taking a majority vote on the classification results to determine the final outcome. Our study found that using a sampling rate of 0.5 seconds, starting at 1 second, and taking five frames yields the highest inference accuracy of 95.4% using a trained customized CNN model. This system can assist all levels of football coaches in the analysis of game footage and formation identification.

Downloads

Download data is not yet available.

References or Bibliography

Ajmeri, O., & Shah, A. (2018). MIT Sloan Sports Analytics Conference. In Using Computer Vision and Machine Learning to Automatically Classify NFL Game Film and Develop a Player Tracking System. Boston. Retrieved December 31, 2022, from https://www.sloansportsconference.com/research-papers/using-computer-vision-and-machine-learning-to-automatically-classify-nfl-game-film-and-develop-a-player-tracking-system.

Atmosukarto, I., Ghanem, B., Ahuja, S., Muthuswamy, K., & Ahuja, N. (2013, June). Automatic Recognition of Offensive Team Formation in American Football Plays. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 991-998). https://doi.org/10.1109/CVPRW.2013.144

Bertasius, G., Wang, H., & Torresani, L. (2021). Is Space-Time Attention All You Need for Video Understanding? https://doi.org/10.48550/ARXIV.2102.05095

Dickmanns, L. (2021). Pose Estimation and Analysis for American Football Videos (dissertation).

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv. https://doi.org/10.48550/arxiv.2010.11929

Feichtenhofer, C. (2020). X3D: Expanding Architectures for Efficient Video Recognition. CoRR, abs/2004.04730. https://doi.org/10.48550/arXiv.2004.04730

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).

Hess, R., Fern, A., & Mortensen, E. (2007). Mixture-of-parts pictorial structures for objects with variable part sets. In Proceedings of the International Conference on Computer Vision (ICCV).

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861

Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221-231. https://doi.org/10.1109/TPAMI.2012.59

Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999-7019. https://doi.org/10.1109/TNNLS.2021.3084827

Newman, J. D. (2022). Automated Pre-Play Analysis of American Football Formations Using Deep Learning. Theses and Dissertations, 9623.

Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollár, P. (2020). Designing Network Design Spaces. arXiv. https://doi.org/10.48550/arxiv.2003.13678

Ribani, R., & Marengoni, M. (2019, August). A Survey of Transfer Learning for Convolutional Neural Networks. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T) (pp. 47-57). https://doi.org/10.1109/SIBGRAPI-T.2019.00010

Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 5, 568-576.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).

Tan, M., Le, Q. V., & Doraswamy, P. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. https://doi.org/10.48550/arXiv.1905.11946

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. arXiv. https://doi.org/10.48550/ARXIV.1608.00859

Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., Manmatha, R., Li, M., & Smola, A. (2020). ResNeSt: Split-Attention Networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2735-2745.

Published

02-28-2023

How to Cite

Zhou, K., & Galbraith, J. (2023). Automatically Labeling Offensive Formations in American Football Film Using Deep Learning. Journal of Student Research, 12(1). https://doi.org/10.47611/jsrhs.v12i1.4278

Issue

Section

HS Research Articles