{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T22:20:11Z","timestamp":1774477211272,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2024,2,10]],"date-time":"2024-02-10T00:00:00Z","timestamp":1707523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100014188","name":"Institute of Information &amp; communications Technology Planning &amp; Evaluation","doi-asserted-by":"publisher","award":["IITP-2023-RS-2023-00254529"],"award-info":[{"award-number":["IITP-2023-RS-2023-00254529"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100014188","name":"Institute of Information &amp; communications Technology Planning &amp; Evaluation","doi-asserted-by":"publisher","award":["RS-2023-00262891"],"award-info":[{"award-number":["RS-2023-00262891"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Pedestrian detection is a critical task for safety-critical systems, but detecting pedestrians is challenging in low-light and adverse weather conditions. Thermal images can be used to improve robustness by providing complementary information to RGB images. Previous studies have shown that multi-modal feature fusion using convolution operation can be effective, but such methods rely solely on local feature correlations, which can degrade the performance capabilities. To address this issue, we propose an attention-based novel fusion network, referred to as INSANet (INtra-INter Spectral Attention Network), that captures global intra- and inter-information. It consists of intra- and inter-spectral attention blocks that allow the model to learn mutual spectral relationships. Additionally, we identified an imbalance in the multispectral dataset caused by several factors and designed an augmentation strategy that mitigates concentrated distributions and enables the model to learn the diverse locations of pedestrians. Extensive experiments demonstrate the effectiveness of the proposed methods, which achieve state-of-the-art performance on the KAIST dataset and LLVIP dataset. Finally, we conduct a regional performance evaluation to demonstrate the effectiveness of our proposed network in various regions.<\/jats:p>","DOI":"10.3390\/s24041168","type":"journal-article","created":{"date-parts":[[2024,2,12]],"date-time":"2024-02-12T03:50:27Z","timestamp":1707709827000},"page":"1168","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4664-4755","authenticated-orcid":false,"given":"Sangin","family":"Lee","sequence":"first","affiliation":[{"name":"Department of Software, Sejong University, Seoul 05006, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9884-3177","authenticated-orcid":false,"given":"Taejoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7591-665X","authenticated-orcid":false,"given":"Jeongmin","family":"Shin","sequence":"additional","affiliation":[{"name":"Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3388-678X","authenticated-orcid":false,"given":"Namil","family":"Kim","sequence":"additional","affiliation":[{"name":"NAVER LABS, Seongnam 13561, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9970-0132","authenticated-orcid":false,"given":"Yukyung","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2024,2,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 19\u201325). nuScenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1109\/TPAMI.2013.124","article-title":"Scene-specific pedestrian detection for static video surveillance","volume":"36","author":"Wang","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","unstructured":"Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., and Zhang, Y. (2019, January 27\u201328). VisDrone-DET2019: The vision meets drone object detection in image challenge results. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Repbulic of Korea."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I. (2015, January 7\u201312). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xu, D., Ouyang, W., Ricci, E., Wang, X., and Sebe, N. (2017, January 21\u201326). Learning cross-modal deep representations for robust pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.451"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Devaguptapu, C., Akolekar, N., M Sharma, M., and N Balasubramanian, V. (2019, January 15\u201320). Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00135"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kieu, M., Bagdanov, A.D., Bertini, M., and Del Bimbo, A. (2020, January 23\u201328). Task-conditioned domain adaptation for pedestrian detection in thermal imagery. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58542-6_33"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gonz\u00e1lez, A., Fang, Z., Socarras, Y., Serrat, J., V\u00e1zquez, D., Xu, J., and L\u00f3pez, A.M. (2016). Pedestrian detection at day\/night time with visible and FIR cameras: A comparison. Sensors, 16.","DOI":"10.3390\/s16060820"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. (2021, January 10\u201317). LLVIP: A visible-infrared paired dataset for low-light vision. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCVW54120.2021.00389"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., and Liu, Z. (2019, January 15\u201320). Weakly aligned cross-modal learning for multispectral pedestrian detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00523"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.inffus.2018.09.015","article-title":"Cross-modality interactive attention network for multispectral pedestrian detection","volume":"50","author":"Zhang","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lef\u00e8vre, S., and Avignon, B. (2021, January 19\u201325). Guided attentive feature fusion for multispectral pedestrian detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (CVPR), Virtual.","DOI":"10.1109\/WACV48630.2021.00012"},{"key":"ref_14","unstructured":"Zheng, Y., Izzat, I.H., and Ziaee, S. (2019). GFD-SSD: Gated fusion double SSD for multispectral pedestrian detection. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"7846","DOI":"10.1109\/LRA.2021.3099870","article-title":"MLPD: Multi-Label Pedestrian Detector in Multispectral Domain","volume":"6","author":"Kim","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_16","unstructured":"Li, C., Song, D., Tong, R., and Tang, M. (2018, January 3\u20136). Multispectral pedestrian detection via simultaneous detection and segmentation. Proceedings of the in British Machine Vision Conference (BMVC), Newcastle, UK."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lefevre, S., and Avignon, B. (2020, January 25\u201328). Multispectral fusion for object detection with cyclic fuse-and-refine blocks. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtual.","DOI":"10.1109\/ICIP40778.2020.9191080"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhou, K., Chen, L., and Cao, X. (2020, January 23\u201328). Improving multispectral pedestrian detection by addressing modality imbalance problems. Proceedings of the European Conference on Computer Vision (ECCV), Springer, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_46"},{"key":"ref_19","unstructured":"Qingyun, F., Dapeng, H., and Zhaokui, W. (2021). Cross-modality fusion transformer for multispectral object detection. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"109913","DOI":"10.1016\/j.patcog.2023.109913","article-title":"ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection","volume":"145","author":"Shen","year":"2024","journal-title":"Pattern Recognit."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/j.infrared.2018.11.007","article-title":"Benchmarking a large-scale FIR dataset for on-road pedestrian detection","volume":"96","author":"Xu","year":"2019","journal-title":"Infrared Phys. Technol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"62775","DOI":"10.1109\/ACCESS.2020.2982539","article-title":"Pedestrian detection in severe weather conditions","volume":"8","author":"Tumas","year":"2020","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, J., Zhang, S., Wang, S., and Metaxas, D.N. (2016). Multispectral deep neural networks for pedestrian detection. arXiv.","DOI":"10.5244\/C.30.73"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, X., Qian, Y., Zhu, H., Wang, C., and Yang, M. (2022, January 23\u201327). BAANet: Learning bi-directional adaptive attention gates for multispectral pedestrian detection. Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA.","DOI":"10.1109\/ICRA46639.2022.9811999"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.patcog.2018.08.005","article-title":"Illumination-aware faster R-CNN for robust multispectral pedestrian detection","volume":"85","author":"Li","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 15\u201320). Selective kernel networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 8\u201314). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Munich, Germany.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_29","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_30","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"9984","DOI":"10.1109\/TITS.2023.3266487","article-title":"Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection","volume":"24","author":"Zhu","year":"2023","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., and Yoo, Y. (2019, January 18\u201322). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV.2019.00612"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv.","DOI":"10.1007\/978-1-4899-7687-1_79"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"136674","DOI":"10.1109\/ACCESS.2020.3011356","article-title":"Toward robust pedestrian detection with data augmentation","volume":"8","author":"Cygert","year":"2020","journal-title":"IEEE Access"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1121","DOI":"10.1007\/s11263-020-01412-0","article-title":"A shape transformation-based dataset augmentation framework for pedestrian detection","volume":"129","author":"Chen","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_36","first-page":"10639","article-title":"Pedhunter: Occlusion robust pedestrian detector in crowded scenes","volume":"34","author":"Chi","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"8483","DOI":"10.1109\/TIP.2021.3115672","article-title":"Autopedestrian: An automatic data augmentation and loss function search scheme for pedestrian detection","volume":"30","author":"Tang","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Khan, A.H., Nawaz, M.S., and Dengel, A. (2023, January 18\u201322). Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00530"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tumas, P., Serackis, A., and Nowosielski, A. (2021). Augmentation of severe weather impact to far-infrared sensor images to improve pedestrian detection system. Electronics, 10.","DOI":"10.3390\/electronics10080934"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian detection: An evaluation of the state of the art","volume":"34","author":"Dollar","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast feature pyramids for object detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"110719","DOI":"10.1016\/j.knosys.2023.110719","article-title":"Nighttime pedestrian detection based on Fore-Background contrast learning","volume":"275","author":"Yao","year":"2023","journal-title":"Knowl.-Based Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/4\/1168\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:58:17Z","timestamp":1760104697000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/4\/1168"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,10]]},"references-count":47,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["s24041168"],"URL":"https:\/\/doi.org\/10.3390\/s24041168","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,10]]}}}