A Method for Training Object Scale Estimation System using Feature Extraction Enhancement with Depth Estimation

Authors

  • Kyungryun Kim Detroit Country Day Upper School

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5144

Keywords:

Object Scale Estimation, Object Detection, Depth Estimation

Abstract

In recent years, machine learning-based object scale estimation has been growing in popularity, as the significance of the technology lies in its potential for use in many industry fields. Although several methods have been proposed, the possible applications of this technique are limited due to its insufficient accuracy. Hence, a human-level accurate system is needed for the technology to be applied in the real-world domain. This research paper proposes a novel object scale estimation system that incorporates the feature extractor, disentangled feature maps, depth estimator, object localizer, and ground truth depth map. The input of the proposed system is an image, which is inputted into the feature extractor to create disentangled feature maps. These feature maps are then extracted by the depth estimator to generate a depth map, and by the object localizer to create a predicted bounding box around the object. The trained feature extractor can extract disentangled size-related features from the inputted image by jointly training the depth estimator and object localizer. The use of disentangled features boosts the performance of the proposed system. In addition, we propose an actual scale converter module to calculate the actual size of the inputted object. Throughout the experiments, the proposed method has proven that it is superior compared to other state-of-the-art methods. The proposed method achieves an IoU (Intersection over Union) value of 0.8113 on the COCO dataset.

 

Downloads

Download data is not yet available.

References or Bibliography

Eigen, David, Christian Puhrsch, and Rob Fergus. "Depth map prediction from a single image using a multi-scale deep network." Advances in neural information processing systems 27 (2014).

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

Joglekar, Apoorva, et al. "Depth estimation using monocular camera." International journal of computer science and information technologies 2.4 (2011): 1758-1763.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Communications of the ACM 60.6 (2017): 84-90.

Khan, Faisal, Saqib Salahuddin, and Hossein Javidnia. "Deep learning-based monocular depth estimation methods—A state-of-the-art review." Sensors 20.8 (2020): 2272.

Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European conference on computer vision. Springer, Cham, 2014.

Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.

Paszke, Adam, et al. "Automatic differentiation in pytorch." (2017).

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014)

Upton, Eben, and Gareth Halfacree. Raspberry Pi user guide. John Wiley & Sons, 2014.

Wang, Jingdong, et al. "Deep high-resolution representation learning for visual recognition." IEEE transactions on pattern analysis and machine intelligence 43.10 (2020): 3349-3364.

Xu, Dan, et al. "Structured attention guided convolutional neural fields for monocular depth estimation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Zhao, Chaoqiang, et al. "Monocular depth estimation based on deep learning: An overview." Science China Technological Sciences 63.9 (2020): 1612-1627.

Loresco, Pocholo James, et al. "Computer vision performance metrics evaluation of object detection based on Haar-like, HOG and LBP features for scale-invariant lettuce leaf area calculation." Int. J. Eng. Technol 7.4 (2018): 4866-4872.

Published

11-30-2023

How to Cite

Kim, K. (2023). A Method for Training Object Scale Estimation System using Feature Extraction Enhancement with Depth Estimation. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5144

Issue

Section

HS Research Articles