A Literature-based Performance Assessment of the YOLO (You Only Look Once) CNN Approach for Real-time Object Detection

Mr. Sandeep Bhattacharjee

doi:https://doi.org/10.17492/computology.v4i2.2402

Submit Manuscript Login / Register Subscribe

Home

Editorial Board Members

Mission, Aims & Scope

Current Issue

A Literature-based Performance Assessment of the YOLO (You Only Look Once) CNN Approach for Real-time Object Detection

Sandeep Bhattacharjee

https://doi.org/10.17492/computology.v4i2.2402

Published Online: December 24, 2024

Author Details ( * ) denotes Corresponding author

1. * Sandeep Bhattacharjee, Assistant Professor, Amity business school, Amity University, Kolkata, West Bengal, India (sandeepbitmba@gmail.com)

Real-time object identification is considered as one of the major catalysts for computer vision, such as video surveillance, autonomous driving, robotics, and augmented reality. You Only Look Once (YOLO) is a state-of-the-art object detection algorithm based on Convolutional Neural Networks (CNNs) that provides an efficient solution by utilizing both classification and localization in a single forward pass through the network. This review provides a comprehensive overview of YOLO’s architecture, key innovations, comparable performance, challenges, and its impact on the field of real-time object detection. It also discusses the improvements that can be made in subsequent versions of YOLO and explores potential future research approaches.

Keywords

Architecture; Classification; Image; Real time; Object recognition

Azuma, R. T. (2016). A survey of augmented reality. Presence: Teleoperators and Virtual Environments, 6(4), 355-385.
Bansal, A., Soni, P., & Shankar, R. (2020). YOLO-Nano: A lightweight object detection model for mobile devices. Retrieved from https://arxiv.org/abs/200 5.07225
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Cheng, S., Wang, Y., & Li, H. (2017). Model compression: A survey. ACM Computing Surveys (CSUR), 50(3), 1-35.
Dosovitskiy, A., & Fischer, P. (2015). Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1734-1747.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303-338.
Fang, W., Wang, L., & Ren, P. (2019). Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access, 8, 1935-1944.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). Retrieved from https://doi.org/10.1109/ CVPR.2012.6248074
Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2019). Relation networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3588-3597.
Huang, T., & Russell, S. (1997, August). Object identification in a bayesian context. IJCAI, 97, 1276-1282.
Jiang, P., Ergu, D., Liu, F., Cai, Y., & Ma, B. (2022). A review of Yolo algorithm developments. Procedia Computer Science, 199, 1066-1073.
Kanimozhi, S., Gayathri, G., & Mala, T. (2019, February). Multiple real-time object identification using single shot multi-box detection. In 2019 International Conference on Computational Intelligence in Data Science (ICCIDS) (pp. 1-5). IEEE.
KR, S. C. (2017, April). Real time object identification using deep convolutional neural networks. In 2017 International Conference on Communication and Signal Processing (ICCSP) (pp. 1801-1805). IEEE.
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., & Quillen, D. (2018). Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 37(4-5), 421-436. Retrieved from DOI: 10.1177/0278364917710318.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944.
Lin, T.-Y., Goyal, P., & Girshick, R. (2017). Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 318-327. Retrieved from https://doi.org/10.1109/TPAMI.2017.2663821.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Dollár, P. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision (pp. 740-755). Springer. Retrieved from https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. European Conference on Computer Vision (ECCV), 21-37. Retrieved from DOI: 10.1007/978-3-319-46448-0_2
Long, X., Deng, K., Wang, G., Zhang, Y., Dang, Q., Gao, Y., ... & Wen, S. (2020). PP-YOLO: An effective and efficient implementation of object detector. Retrieved from https://doi.org/10.48550/arXiv.2007.12099
Mital, N., Malzard, S., Walters, R., De Melo, C. M., Rao, R., & Nockles, V. (2024). Improving object detection by modifying synthetic data with explainable AI. Retrieved from https://doi.org/10.48550/arXiv.2412.01477
Piccinini, P., Prati, A., & Cucchiara, R. (2012). Real-time object detection and localization with SIFT-based clustering. Image and Vision Computing, 30(8), 573-587.
Redmon, J., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. DOI: 10.1109/CVPR.2016.91
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7263-7271.
Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. Retrieved from https://doi.org/10.48550/arXiv.1804.02767
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. Retrieved from DOI: 10.1109/CVPR.2016.91
Roman, M. (2024). Enhancing object detection with self-supervised learning: improving object detection algorithms using unlabeled data through self-supervised techniques. Roman Publishing. Retrieved from https://romanpub.com/resources/Vol%205%20%2C%20No%201%20-%2023.pdf
Shafiee, M. J., Chywl, B., Li, F., & Wong, A. (2017). Fast YOLO: A fast you only look once system for real-time embedded object detection in video. Retrieved from https://doi.org/10.48550/arXiv.1709.05943
Shangguan, Z., & Rostami, M. (2023). Identification of novel classes for improving few-shot object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 3356–3365). Retrieved from https://doi.org/10.1109/ICCVW54120.2023.00410
Shi, S., & Zhang, J. (2020). PointRCNN: 3D object proposal generation and detection from point cloud. Retrieved from https://arxiv.org/abs/1812.04256.
Shrivastava, A., Gupta, A., & Girshick, R. (2016). Training region-based object detectors with online hard example mining. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 761-769.
Sun, L., Lee, W., & Wilson, S. (2020). The Waymo Open Dataset: Large-scale autonomous driving data for object detection, tracking, and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9789-9797). Retrieved from https://doi.org/10.1109/CVPR42 600.2020.00980
Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., & Liang, Z. (2019). Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Computers and Electronics in Agriculture, 157, 417-426.
Vaswani, A., Shazeer, N., & Parmar, N. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Retrieved from https://arxiv.org/abs/1706.03762.
Wang, Y., Yan, X., Zhang, K., Gong, L., Xie, H., Wang, F. L., & Wei, M. (2022). TogetherNet: Bridging image restoration and object detection together via dynamic enhancement learning. https://doi.org/10.48550/arXiv.2209.01373
Wang, Z., Men, S., Bai, Y., Yuan, Y., Wang, J., Wang, K., & Zhang, L. (2024). Improved small object detection algorithm CRL-YOLOv5. Sensors, 24(19), 6437. Retrieved from https://doi.org/10.3390/s24196437
Woo, S., Park, J., & Lee, J. Y. (2018). CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (pp. 3-19). Retrieved from https://doi.org/10.1007/978-3-030-01234-2_1
Wu, W., Zhang, Z., & Wang, X. (2020). VisualWakeup: Detecting small objects in low-light and night-time images using multi-modal fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4609-4617). Retrieved from https://doi.org/10.1109/CVPR42600.2020.00465
Xailient. (n.d.). Detectum: Faster than any other cutting-edge object detector model. Retrieved December 22, 2024, from https://xailient.com/blog/detectum-faster-than-any-other-cutting-edge-object-detector-model/
Zhang, S., Chi, C., Yao, Y., Lei, Z., & Li, S. Z. (2018). Detecting small objects in object detection: A survey. arXiv preprint arXiv:1904.00304.
Zhang, S., Zhang, S., Huang, W., Li, M., & Qiao, B. (2019). Deep learning-based object detection for autonomous driving: A survey. IEEE Transactions on Neural Networks and Learning Systems, 30(7), 3212-3232. Retrieved from DOI: 10.1109/TNNLS.2018.2876865.
Zhou, B., Zhao, H., & Xie, J. (2017). Scene parsing through ADE20K dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1-12). Retrieved from https://doi.org/10.1109/CVPR.2017.7399 384