An Improved Object Tracking Technique for Remote Weapon Station Using Yolov5_Deepsort_Dlib Architecture

Document Type : Original Article


1 Computer Science Department, Air Force Institute of Technology, Kaduna State, Nigeria

2 Computer Science Department, Nigerian Defence Academy, Nigeria

3 Computer Science Department, Nigerian Defence Academy, Kaduna, Nigeria


This paper introduces an advanced tracking object architecture named DeepSORT_YOLOv5_Dlib. Building upon the DeepSORT_YOLOv3 framework, the study [1] integrates the Digital Library's correlation tracker into the traditional DeepSORT_YOLOv3 to minimize identity switches. Notably, the architecture is designed to operate in parallel, enhancing its operational speed. Experimental results indicate that the proposed approach outperforms the conventional DeepSort_YOLOv3, showcasing reduced identity switches and increased operational speed across various video testing scenarios. The custom model employed in this study adopts a confidence threshold of 0.2 and an image size of 416 x 416, consistent with the training size. To boost detection within YOLOv5, the model incorporates the Slicing Aided Hyper Inference (SAHI) technique. The overall inference speed in this study reaches 314.8fps, a notable improvement compared to Dang's 218.6fps. Evaluation using the COCO dataset demonstrates the model's precision at 0.98 and a recall of 0.81. Additionally, the proposed custom model exhibits a MOTA of 0.86, surpassing the benchmark's 0.83. Notably, our model achieves a significantly lower identity switch count of 1881 compared to the benchmark's count of 2288. Furthermore, it outperforms the benchmark in object detection capabilities. By incorporating SAHI inference with YOLOv5, the study enhances detection accuracy, resulting in an overall tracking accuracy improvement from 56% to 79%. These findings highlight the efficacy of the proposed custom model in achieving superior performance in object tracking and detection.


  • Dang, T. L., Nguyen, G. T., & Cao, T. (2020). Object tracking using improved deep SORT YOLOv3 architecture. ICIC Express Letters, 14(10), 961-969.
  • Kothiya, S. V., & Mistree, K. B. (2015). A review on real time object tracking in video sequences. 2015 International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO). Visakhapatnam.
  • Vishwakarma, A., & Khare, A. (2018). Vehicle detection and tracking for traffic surveillance applications: A review paper.
  • Sachan, A. (2019). Zero to hero: A quick guide to object tracking: Mdnet, goturn, rolo. CV-Tricks.
  • Taguri, Y., Erlichmen, S., & Lussato, R. (2015). Object Tracking in Deep Learning -
  • Sagar, R. (2019). How the deep learning approach for object detection evolved over the years. Retrieved 1 January 2024, from Analytics India Magazine website:
  • Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261-318.
  • Boulanin V., Verbruggen M. (2017). SIPRI Mapping the development of autonomy in weapon systems. Boulanin, M. Verbruggen, SIPRI, Solna: SIPRI.
  • Nolan, C. J. (2017). The allure of battle: A history of how wars have been won and lost. Oxford University Press.
  • Mohamed, M., Jens, L., Sören, A., Claus, S., & Sebastian, H. (2012). About: Remote controlled weapon station. DBpedia.
  • Melanie, S. (2016). The Inevitable Militarization of Artificial Intelligence. In Cyber Defense Review.
  • Rebello, L. (2018). Autonomous Targeting System using Open CV. International Journal for Research in Applied Science and Engineering Technology, 6(3), 2545-2549.
  • Liang, Q., Wu, W., Yang, Y., Zhang, R., Peng, Y., & Xu, M. (2020). Multi-player tracking for multi-view sports videos with improved k-shortest path algorithm. Applied Sciences, 10(3), 864.
  • Nyström, A. (2019). Evaluation of Multiple Object Tracking in Surveillance Video.
  • Xu, Y., & Wang, J. (2019). A unified neural network for object detection, multiple object tracking and vehicle re-identification. arXiv preprint arXiv:1907.03465.
  • Li, W., Mu, J., & Liu, G. (2019). Multiple object tracking with motion and appearance cues. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 0-0).
  • Zhang, X., Hao, X., Liu, S., Wang, J., Xu, J., & Hu, J. (2019). Multi-target tracking of surveillance video with differential YOLO and DeepSort. In X. Jiang & J.-N. Hwang (Eds.), Eleventh International Conference on Digital Image Processing (ICDIP 2019).
  • Mohana, H. V., & Ravish, A. (2019). Object Detection and Classification Algorithms using Deep Learning for Video Surveillance Applications. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(8), 386-395.
  • Mandal, V., & Adu-Gyamfi, Y. (2020). Object detection and tracking algorithms for vehicle counting: a comparative analysis. Journal of Big Data Analytics in Transportation, 2(3), 251-261.
  • Santos, A. M., Bastos-Filho, C. J. A., Maciel, A. M. A., & Lima, E. (2020, November). Counting vehicle with high-precision in Brazilian roads using YOLOv3 and deep SORT. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). Porto de Galinhas, Brazil. doi:
  • Santos, A. M., Bastos-Filho, C. J., Maciel, A. M., & Lima, E. (2020). Counting vehicle with high-precision in brazilian roads using yolov3 and deep sort. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)(pp. 69-76). IEEE.
  • Punn, N. S., Sonbhadra, S. K., Agarwal, S., & Rai, G. (2020). Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques. Retrieved from
  • Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261-318.
  • Tran, V. H., Dang, L. H. H., Nguyen, C. N., Le, N. H. L., Bui, K. P., Dam, L. T., & Huynh, D. H. (2021). Real-time and robust system for counting movement-specific vehicle at crowded intersections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4228-4235).
  • Duan, C., & Li, X. (2021). Multi-target tracking based on deep sort in traffic scene. Journal of Physics. Conference Series, 1952(2), 022074.
  • Shukla, R. I. T. I. K., Mahapatra, A. K., & Selvin Paul Peter, J. (2021). Social distancing tracker using yolo v5. Turkish Journal of Physiotherapy and Rehabilitation, 1785-1793.
  • Meimetis, D., Daramouskas, I., Perikos, I., & Hatzilygeroudis, I. (2023). Real-time multiple object tracking using deep learning methods. Neural Computing & Applications, 35(1), 89-118.
  • Pramanik, A., Pal, S. K., Maiti, J., & Mitra, P. (2022). Granulated RCNN and multi-class deep SORT for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(1), 171-181.
  • Wu, P., Xu, H., Ding, Y., Wang, Z., & Zhang, J. (2021). An improved online multiple object tracking algorithm based on KFHT motion compensation model in the aerial videos. In Seventh Symposium on Novel Photoelectronic Detection Technology and Applications (Vol. 11763, pp. 2431-2436). SPIE.
  • Gao, G., & Lee, S. (2021). Design and Implementation of Fire Detection System Using New Model Mixing. International Journal Advanced Culture Technology, 9(4), 260-267.
  • Lewert, J. (2021). Human Detection for Flood Rescue: Application of YOLOv5 Algorithm and DeepSORT Object Tracking (Doctoral dissertation).
  • Francies, M. L., Ata, M. M., & Mohamed, M. A. (2022). A robust multiclass 3D object recognition based on modern YOLO deep learning algorithms. Concurrency and Computation: Practice & Experience, 34(1).
  • Neethirajan, S. (2022). ChickTrack-A quantitative tracking tool for measuring chicken activity.
  • Shoman, M., Aboah, A., Morehead, A., Duan, Y., Daud, A., & Adu-Gyamfi, Y. (2022, June). A region-based deep learning approach to automated retail checkout. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New Orleans, LA, USA.
  • Nepal, U., & Eslamiat, H. (2022). Comparing YOLOv3, YOLOv4 and YOLOv5 for autonomous landing spot detection in faulty UAVs. Sensors (Switzerland), 22(2), 464.
  • Patel, K., Bhatt, C., & Mazzeo, P. L. (2022). Deep learning-based automatic detection of ships: An experimental study using satellite images. Journal of Imaging, 8(7), 182.
  • Ye, K., Dong, J., & Zhang, L. (2022). Digital analysis of movements on characters based on OpenPose and Dlib from video. Journal of Physics. Conference Series, 2218(1), 012021.
  • Schmidt, J., Marques, M. R. G., Botti, S., & Marques, M. A. L. (2019). Recent advances and applications of machine learning in solid-state materials science. Npj Computational Materials, 5(1).
  • Abbasi, M., Shahraki, A., & Taherkordi, A. (2021). Deep learning for network traffic monitoring and analysis (NTMA): A survey. Computer Communications, 170, 19-41.
  • Roboflow (n.d). Roboflow Public Dataset (n.d). Public Dataset of Pistols. Retrieved from
  • Google Open Images. (n.d.). Google Open Images Dataset of Person, Handgun, Rifle and Knife. Retrieved from web/visualizer/index.html.
  • Akyon, F. C., Altinuc, S. O., & Temizel, A. (2022). Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP) (pp. 966-970). IEEE.
  • Galanty, A., Danel, T., WÄ™grzyn, M., Podolak, I., & Podolak, I. (2021). Deep convolutional neural network for preliminary in-field classification of lichen species. biosystems engineering, 204, 15-25.
  • Luiten, J., Os Ep, A. A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., & Leibe, B. (2021). HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 129(2), 548–578.