International Journal of Advanced Innovative Technology in Engineering (IJAITE)



Video Object Detection and Human Action Recognition: Techniques, Challenges, and Future Trends

Kanchan S. Tidke, Meenal G. Kachhavay, Ashvini D. Nakhale

Abstract :

Video object detection and human action recognition have become essential components of modern computer vision systems. These techniques are widely utilized in scenarios such as traffic monitoring, industrial safety, and public surveillance. However, challenges like motion blur, occlusion, and video defocus hinder the accuracy and reliability of detection systems. This paper provides a comprehensive review of video object detection techniques, covering frame-based approaches, one-stage and two-stage detection algorithms, and mixed-stage methods. It also examines commonly used datasets, including ImageNet VID and YouTube-Bounding Boxes, to highlight their strengths and limitations. Furthermore, advancements in object detection, including deep learning methods like YOLO and Vision Transformers, are discussed. By identifying challenges such as real-time detection in resource-constrained environments and ethical concerns like data privacy, this study explores emerging trends and future directions in video detection systems. The findings aim to guide researchers and practitioners toward developing efficient, robust, and ethical video object detection solutions.

Keywords :

Object Detection, Human Recognition, Traffic Monit

Full Text :

Download PDF

DOI : 10.5281/zenodo.14831793

Cite this paper :

References :

[1] Bouafia, Y., Guezouli, L. & Lakhlef, H. Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification. Iran J Sci Technol Trans Electr Eng 46, 971–988 (2022). https://doi.org/10.1007/s40998-022-00512-6 [2] Nouar AlDahoul, Aznul Qalid Md Sabri, Ali Mohammed Mansoor, "Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models", Computational Intelligence and Neuroscience, vol. 2018, Article ID 1639561, 14 pages, 2018. https://doi.org/10.1155/2018/1639561 [3] Duan, Genquan & Ai, Haizhou & Lao, Shihong. (2010). Human Detection in Video over Large Viewpoint Changes. 6493. 683-696. 10.1007/978-3-642-19309-5_53. [4] Brox, T.; Malik, J. Object Segmentation by Long Term Analysis of Point Trajectories. In Proceedings of the Computer Vision—ECCV 2010, Berlin/Heidelberg, Germany, 5–11 September 2010; pp. 282–295. [5] Kristan, M.; Pflugfelder, R.; Leonardis, A.; Matas, J.; Cˇ ehovin, L.; Nebehay, G.; Vojírˇ, T.; Fernández, G.; Lukežicˇ, A.; Dimitriev, A.; et al. The Visual Object Tracking VOT2014 Challenge Results. In Proceedings of the Computer Vision—ECCV 2014 Workshops, Cham, Switzerland, 6–7 September 2014; pp. 191–217. [6] Dendorfer, P.; Osep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. Motchallenge: A benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 2021, 129, 845–881 [7] Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [8] Kliper-Gross, O.; Hassner, T.;Wolf, L. The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 615–621. [9] Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Fei-Fei, L. Large-Scale Video Classification with Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1725–1732. [10] Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The ApolloScape Dataset for Autonomous Driving.In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1067–10676. [11] Yavariabdi, A.; Kusetogullari, H.; Cicek, H. UAV detection in airborne optic videos using dilated convolutions. J. Opt.- India 2021, 50, 569–582 [12] Yavariabdi, A.; Kusetogullari, H.; Celik, T.; Cicek, H. FastUAV-NET: A Multi-UAV Detection Algorithm for Embedded Platforms. Electronics 2021, 10, 724. [13] Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding Data Augmentation for Classification: When to Warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [14] Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [15] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [16] Wang, Y.; Jodoin, P.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An Expanded Change Detection Benchmark Dataset. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Washington, DC, USA, 23–28 June 2014; pp. 393-400. [17] He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [18] Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [19] Lao, D.; Sundaramoorthi, G. Minimum Delay Object Detection from Video. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 5096–5105. [20] Pont-Tuset, J.; Perazzi, F.; Caelles, S.; Arbeláez, P.; SorkineHornung, A.; Van Gool, L. The 2017 davis challenge on video object segmentation. arXiv 2017, arXiv:1704.00675. [21] Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Gool, L.V.; Gross, M.; Sorkine-Hornung, A. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 724–732. [22] Usha Rani, J., Raviraj, P. Real-Time Human Detection for Intelligent Video Surveillance: An Empirical Research and In-depth Review of its Applications. SN COMPUT. SCI. 4, 258 (2023). https://doi.org/10.1007/s42979-022- 01654-4 [23] K. Visakha and S. S. Prakash, "Detection and Tracking of Human Beings in a Video Using Haar Classifier," 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2018, pp. 1-4, doi: 10.1109/ICIRCA.2018.8597322.