Viability of Automating Annotation through MediaPipe’s Effectiveness on Pose Points Model Accuracy

Authors

  • Dhruv Jena Mountain House High School
  • Daksh Jain
  • Anthony Mauro

DOI:

https://doi.org/10.47611/jsrhs.v14i1.8863

Keywords:

Automated annotation, MediaPipe, manual annotation, pose estimation, keypoint variance, machine learning datasets, annotation efficiency

Abstract

This research investigates the effectiveness of automated annotation using MediaPipe for human motion recognition tasks, comparing its performance against manually annotated data from the MP2 dataset. Through the evaluation of three machine learning models—Generative Adversarial Network (GAN), Dense model, and Transformer model—applied to both datasets, we assess the impact of dataset quality and annotation methods on model performance. The findings indicate that while models trained on MediaPipe-annotated data generally outperformed those trained on MP2, the overall accuracy remained low across all models, highlighting challenges in generalization. The study identifies the need for high-quality automated annotations that approach the granularity of manual annotations to improve performance. Moreover, it suggests that environmental factors such as lighting, background, and camera angles, which can affect joint detection accuracy, contribute to performance inconsistencies. The research also emphasizes the importance of data preprocessing, augmentation, and the potential for combining multi-modal data to enhance annotation precision. Ultimately, this study demonstrates that automated annotation offers scalability for large-scale projects but requires refinement in handling complex, dynamic environments to fully realize its potential in machine learning applications. 

Downloads

Download data is not yet available.

References or Bibliography

Fuchs, S., Schnellbach, J., Schmidt, L., & Wittges, H. (2024). Data Annotation for Support Ticket Data: A literature review. Proceedings of the Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2023.196

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jordan, I. D., Sokół, P. A., & Park, I. M. (2021). Gated recurrent units viewed through the lens of continuous time dynamical systems. Frontiers in Computational Neuroscience, 15. https://doi.org/10.3389/fncom.2021.678158

K, A., P, P., & Paulose, J. (2021). Human body pose estimation and applications. 2021 Innovations in Power and Advanced Computing Technologies (i-PACT), 1–6. https://doi.org/10.1109/i-pact52855.2021.9696513

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J., Chang, W.-T., Hua, W., Georg, M., & Grundmann, M. (2019). MediaPipe: A Framework for Building Perception Pipelines (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1906.08172

Price, E., & Ahmad, A. (2024). Accelerated video annotation driven by deep detector and tracker. Lecture Notes in Networks and Systems, 141–153. https://doi.org/10.1007/978-3-031-44981-9_12

Quiñonez, Y., Lizarraga, C., & Aguayo, R. (2022). Machine Learning Solutions with MediaPipe. 2022 11th International Conference On Software Process Improvement (CIMPS), 212–215. https://doi.org/10.1109/cimps57786.2022.10035706

Radeta, M., Freitas, R., Rodrigues, C., Zuniga, A., Nguyen, N. T., Flores, H., & Nurmi, P. (2024). Man and the machine: Effects of ai-assisted human labeling on interactive annotation of real-time video streams. ACM Transactions on Interactive Intelligent Systems, 14(2), 1–22. https://doi.org/10.1145/3649457

Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306

Ward, T. M., Fer, D. M., Ban, Y., Rosman, G., Meireles, O. R., & Hashimoto, D. A. (2021). Challenges in surgical video annotation. Computer Assisted Surgery, 26(1), 58–68. https://doi.org/10.1080/24699322.2021.1937320

Wood, D., Lublinsky, B., Roytman, A., Singh, S., Adam, C., Adebayo, A., An, S., Chang, Y. C., Dang, X.-H., Desai, N., Dolfi, M., Emami-Gohari, H., Eres, R., Goto, T., Joshi, D., Koyfman, Y., Nassar, M., Patel, H., Selvam, P., … Daijavad, S. (2024). Data-Prep-Kit: Getting your data ready for LLM application development. arXiv. https://doi.org/10.48550/ARXIV.2409.18164

Zhang, S., Jafari, O., & Nagarkar, P. (2021). A survey on machine learning techniques for auto labeling of video, audio, and text data (No. arXiv:2109.03784). https://doi.org/10.48550/arXiv.2109.03784

Zou, Y., Zhang, S., Chen, G., Tian, Y., Keutzer, K., & Moura, J. M. F. (2021). Annotation-efficient untrimmed video action recognition. Proceedings of the 29th ACM International Conference on Multimedia, 487–495. https://doi.org/10.1145/3474085.3475197

Published

02-28-2025

How to Cite

Jena, D., Jain, D., & Mauro, A. (2025). Viability of Automating Annotation through MediaPipe’s Effectiveness on Pose Points Model Accuracy . Journal of Student Research, 14(1). https://doi.org/10.47611/jsrhs.v14i1.8863

Issue

Section

HS Research Projects