{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T15:41:28Z","timestamp":1772984488110,"version":"3.50.1"},"reference-count":51,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T00:00:00Z","timestamp":1772409600000},"content-version":"vor","delay-in-days":60,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"},{"start":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T00:00:00Z","timestamp":1767225600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Image Processing"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>With the rapid development of deep learning, research on semantic segmentation of remote sensing images has made significant progress. However, there are common problems in remote sensing images, such as large\u2010scale differences between different types of objects and unbalanced sample numbers, which leads to poor semantic segmentation results, especially for small targets and rare types of objects. To address these challenges, a remote sensing image semantic segmentation method based on multi\u2010scale contextual information analysis named MSTNet is innovatively proposed. Its core design includes the semantic information enhancement module (SIE) of feature adaptive clustering, which strengthens the feature expression of different categories through adaptive clustering to alleviate sample imbalance; the weighted feature fusion module (WFF) adaptively aggregates cross\u2010level features, cooperates with the multi\u2010scale context enhancement module (MSCE), combines convolution and Transformer operations, and deeply mines local and global contexts to cope with scale changes; in addition, the network also contains a pixel space feature optimisation module (SFEM) to enhance spatial details. Experiments on the UAVid, LoveDA, Potsdam and Vaihingen datasets show that MSTNet significantly improves the ability to handle scale changes and imbalance problems and reaches advanced levels in key indicators such as OA, mIoU, and mF1, proving that MSTNet achieves competitive or even better performance.<\/jats:p>","DOI":"10.1049\/ipr2.70326","type":"journal-article","created":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T09:48:47Z","timestamp":1772963327000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MSTNet: Multi\u2010Scale Contextual Analysis Network for Semantic Segmentation of Remote Sensing Images"],"prefix":"10.1049","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6164-1253","authenticated-orcid":false,"given":"Longbao","family":"Wang","sequence":"first","affiliation":[{"name":"College of Computer Science and Software Engineering Hohai University  Nanjing China"},{"name":"Key Laboratory of Water Big Data Technology of Ministry of Water Resources Hohai University  Nanjing China"}]},{"given":"Mingxuan","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering Hohai University  Nanjing China"}]},{"given":"Xiaoliang","family":"Luo","sequence":"additional","affiliation":[{"name":"Jiangxi Virtual Reality Technology Co., Ltd.  Nanchang China"}]},{"given":"Lvchun","family":"Wang","sequence":"additional","affiliation":[{"name":"Jiangxi Virtual Reality Technology Co., Ltd.  Nanchang China"}]},{"given":"Mu","family":"He","sequence":"additional","affiliation":[{"name":"Jiangxi Virtual Reality Technology Co., Ltd.  Nanchang China"}]},{"given":"Chong","family":"Long","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering Hohai University  Nanjing China"}]},{"given":"Meng","family":"Ding","sequence":"additional","affiliation":[{"name":"Institute of Water Science and Technology Hohai University  Nanjing China"},{"name":"National Engineering Research Center of Water Resources Efficient Utilization and Engineering Safety Hohai University  Nanjing China"},{"name":"Key Laboratory of Flood Disaster Risk Warning, Prevention and Mitigation Ministry of Emergency Management Hohai University  Nanjing China"}]}],"member":"265","published-online":{"date-parts":[[2026,3,2]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1093\/nsr\/nwz058"},{"key":"e_1_2_10_3_1","doi-asserted-by":"crossref","unstructured":"J.Long E.Shelhamer andT.Darrell \u201cFully Convolutional Networks for Semantic Segmentation \u201d inProceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2015) 3431\u20133440 https:\/\/doi.org\/10.1109\/CVPR.2015.7298965 7\u201312 June.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_2_10_4_1","doi-asserted-by":"crossref","unstructured":"O.Ronneberger P.Fischer andT.Brox \u201cU\u2010Net: Convolutional Networks for Biomedical Image Segmentation \u201d inProceedings of the 18th International Conference on Medical Image Computing and Computer\u2010Assisted Intervention (MICCAI)(Springer 2015) 234\u2013241 5\u20139 October.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_2_10_5_1","doi-asserted-by":"crossref","unstructured":"H.Zhao J.Shi X.Qi X.Wang andJ.Jia \u201cPyramid Scene Parsing Network \u201d inProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE 21\u201326 July2017) 2881\u20132890 https:\/\/doi.org\/10.1109\/CVPR.2017.660.","DOI":"10.1109\/CVPR.2017.660"},{"key":"e_1_2_10_6_1","doi-asserted-by":"crossref","unstructured":"L.Chen G.Papandreou F.Schroff andH.Adam \u201cRethinking Atrous Convolution for Semantic Image Segmentation \u201d preprint arXiv June 17 2017 https:\/\/doi.org\/10.48550\/arXiv.1706.05587.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"e_1_2_10_7_1","unstructured":"A.Vaswani N.Shazeer andN.Parmar \u201cAttention is all you Need \u201d inProceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017) 6000\u20136010 4\u20139 December."},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"S.Zheng J.Lu H.Zhao X.Zhu andL.Zhang \u201cRethinking Semantic Segmentation from a Sequence\u2010to\u2010Sequence Perspective With Transformers \u201d inProceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2021) 6881\u20136890.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"e_1_2_10_9_1","unstructured":"A.Dosovitskiy L.Beyer A.Kolesnikov et\u00a0al. \u201cAn Image is Worth 16\u00d716 Words: Transformers for Image Recognition at Scale \u201d preprint arXiv June 21 2021 https:\/\/doi.org\/10.48550\/arXiv.2010.11929."},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs14164065"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"W.Wang E.Xie andX.Li \u201cPyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions \u201d inProceedings of the IEEE\/CVF International Conference on Computer Vision(IEEE 2021) 568\u2013578.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"e_1_2_10_12_1","unstructured":"X.Chu Z.Tian andY.Wang \u201cTwins: Revisiting Spatial Attention Design in Vision Transformers \u201d inProceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS)(Virtual Event 2021) 9355\u20139366."},{"key":"e_1_2_10_13_1","unstructured":"W.Wang X.Chen andY.Cao \u201cCrossFormer: A Versatile Vision Transformer Based on Cross\u2010Scale Attention \u201d inProceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)(IEEE 2023) 3472\u20133482."},{"key":"e_1_2_10_14_1","doi-asserted-by":"crossref","unstructured":"Z.Tu H.Talebi andH.Zhang \u201cMaxViT: Multi\u2010Axis Vision Transformer \u201d inProceedings of the European Conference on Computer Vision (ECCV)(Springer 2022) 459\u2013479.","DOI":"10.1007\/978-3-031-20053-3_27"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"Z.Xia L.Zhang X.Lu H.Tan T.Chen andX.Zhou \u201cVision Transformer With Deformable Attention \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2022) 4794\u20134803.","DOI":"10.1109\/CVPR52688.2022.00475"},{"key":"e_1_2_10_16_1","doi-asserted-by":"crossref","unstructured":"L.Zhu X.Wang Z.Ke W.Zhang andR. W. H.Lau \u201cBiFormer: Vision Transformer With Bi\u2010Level Routing Attention \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2023) 10323\u201310332.","DOI":"10.1109\/CVPR52729.2023.00995"},{"key":"e_1_2_10_17_1","unstructured":"Y.Zeng J.Lin andJ.Zhang \u201cToken Clustering Transformer for Efficient Natural Image Processing \u201d inProceedings of the 17th European Conference on Computer Vision (ECCV)(IEEE 2022) 732\u2013749."},{"key":"e_1_2_10_18_1","unstructured":"X.Chen S.Peng X.Zhang L.Xie Z.Huang andP.Luo \u201cNAS\u2010ViT: Neural Architecture Search for Efficient Vision Transformers \u201d inProceedings of the 9th International Conference on Learning Representations (ICLR)(Virtual Event 2021) 1\u201315."},{"key":"e_1_2_10_19_1","unstructured":"C.GongandL.Wang \u201cAutoViT: Vision Transformer Search With Architecture Optimization \u201d inProceedings of the 36th AAAI Conference on Artificial Intelligence Virtual Event(AAAI Press 2022) 1024\u20131032."},{"key":"e_1_2_10_20_1","unstructured":"Z.Liu M.Xu L.Wang Y.Dai H.Tang andJ.Cao \u201cAdapter: Parameter\u2010Efficient Transfer Learning for NLP \u201d inProceedings of the 38th International Conference on Machine Learning (ICML)(Virtual Event 2021) 6962\u20136974."},{"key":"e_1_2_10_21_1","unstructured":"H.Liu Y.Fu andY.Zhang \u201cViT\u2010Adapter: Efficient Vision Transformer Adaptation with PEFT \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2023) 21459\u201321468."},{"key":"e_1_2_10_22_1","unstructured":"Y.Wang Y.Li andH.Chen \u201cEfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2023) 14460\u201314470."},{"key":"e_1_2_10_23_1","first-page":"1","article-title":"Rethinking Transformers for Semantic Segmentation of Remote Sensing Images","volume":"61","author":"Liu Y.","year":"2023","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_2_10_24_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2024.3414293","article-title":"Rs3mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation","volume":"21","author":"Ma X.","year":"2024","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"key":"e_1_2_10_25_1","first-page":"1","article-title":"MCMNet: Multi\u2010Scal Context Modeling Network for Semantic Segmentation of Remote Sensing Images","volume":"60","author":"Li Y.","year":"2022","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_2_10_26_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs14091956"},{"key":"e_1_2_10_27_1","unstructured":"S.MehtaandM.Rastegari \u201cMobileViT: Light\u2010Weight General\u2010Purpose and Mobile\u2010Friendly Vision Transformer \u201d preprint arXiv March 4 2021 https:\/\/doi.org\/10.48550\/arXiv.2110.02178."},{"key":"e_1_2_10_28_1","unstructured":"S.MehtaandM.Rastegari \u201cSeparable Self\u2010Attention for Mobile Vision Transformers \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2022) 12308\u201312318."},{"key":"e_1_2_10_29_1","unstructured":"J.Pan Z.Lin X.Zhu J.Zhang S.Wang andY.Jia \u201cLocal\u2010Global\u2010Local Information Exchange for Efficient Vision Transformers \u201d inProceedings of the European Conference on Computer Vision (ECCV)( Springer 2022) 639\u2013657."},{"key":"e_1_2_10_30_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2023.3314641","article-title":"CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote\u2010Sensing Image Semantic Segmentation","volume":"61","author":"Wu H.","year":"2023","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_2_10_31_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics13234610"},{"key":"e_1_2_10_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2025.3595010"},{"key":"e_1_2_10_33_1","unstructured":"L.\u2010C.Chen G.Papandreou F.Schroff andH.Adam \u201cRethinking Atrous Convolution for Semantic Image Segmentation \u201d preprint arXiv December 52021 https:\/\/doi.org\/10.48550\/arXiv.1706.05587."},{"key":"e_1_2_10_34_1","doi-asserted-by":"crossref","unstructured":"H.Zhao J.Shi X.Qi X.Wang andJ.Jia \u201cPyramid Scene Parsing Network \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2017) 2881\u20132890.","DOI":"10.1109\/CVPR.2017.660"},{"key":"e_1_2_10_35_1","doi-asserted-by":"crossref","unstructured":"Z.Liu Y.Lin andY.Cao \u201cSwin Transformer: Hierarchical Vision Transformer Using Shifted Windows \u201d inProceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)(Virtual Event 2021) 10012\u201310022.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"X.Wang R.Girshick A.Gupta andK.He \u201cNon\u2010local Neural Networks \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE 2018) 7794\u20137803.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_2_10_37_1","doi-asserted-by":"crossref","unstructured":"J.Fu J.Liu andH.Tian \u201cDual Attention Network for Scene Segmentation \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2019) 3146\u20133154.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"e_1_2_10_38_1","doi-asserted-by":"crossref","unstructured":"Y.Yuan X.Chen andJ.Wang \u201cObject\u2010Contextual Representations for Semantic Segmentation \u201d inProceedings of the European Conference on Computer Vision (ECCV)(Virtual Event 2020) 173\u2013190.","DOI":"10.1007\/978-3-030-58539-6_11"},{"key":"e_1_2_10_39_1","doi-asserted-by":"crossref","unstructured":"O.Ronneberger P.Fischer andT.Brox \u201cU\u2010Net: Convolutional Networks for Biomedical Image Segmentation \u201d inProceedings of the Medical Image Computing and Computer\u2010Assisted Intervention (MICCAI)(Springer 2015) 234\u2013241.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_2_10_40_1","first-page":"1","article-title":"HiResNet: High\u2010Resolution Feature Fusion Network for Remote Sensing Image Segmentation","volume":"62","author":"Zhang L.","year":"2024","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_2_10_41_1","unstructured":"A.Vaswani N.Shazeer andN.Parmar \u201cAttention Is All You Need \u201d inProceedings of the Advances in Neural Information Processing Systems (NeurIPS)(NeurIPS 2017) 5998\u20136008."},{"key":"e_1_2_10_42_1","unstructured":"A.Dosovitskiy L.Beyer andA.Kolesnikov \u201cAn Image Is Worth 16\u00d716 Words: Transformers for Image Recognition at Scale \u201d inProceedings of the 9th International Conference on Learning Representations (ICLR)(Virtual Event 2021)."},{"key":"e_1_2_10_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2022.06.008"},{"key":"e_1_2_10_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.114417"},{"key":"e_1_2_10_45_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs10111768"},{"key":"e_1_2_10_46_1","doi-asserted-by":"crossref","unstructured":"S.Woo J.Park J.Lee andI. S.Kweon \u201cCBAM: Convolutional Block Attention Module \u201d inProceedings of the 15th European Conference on Computer Vision (ECCV)(Springer 2018) 3\u201319.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_2_10_47_1","doi-asserted-by":"crossref","unstructured":"X.Wang R.Girshick A.Gupta andK.He \u201cNon\u2010Local Neural Networks \u201d inProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2018) 7794\u20137803.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_2_10_48_1","doi-asserted-by":"crossref","unstructured":"Q.Wang B.Wu P.Zhu andL.Zhang \u201cECA\u2010Net: Efficient Channel Attention for Deep Convolutional Neural Networks \u201d inProceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2020) 11534\u201311542 https:\/\/doi.org\/10.1109\/CVPR42600.2020.01155.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"e_1_2_10_49_1","doi-asserted-by":"crossref","unstructured":"M.Yang K.Yu C.Zhang andJ.Chen \u201cDenseASPP for Semantic Segmentation in Street Scenes \u201d inProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(IEEE 2018) 3684\u20133692.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"e_1_2_10_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3042065"},{"key":"e_1_2_10_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2021.09.005"},{"key":"e_1_2_10_52_1","first-page":"1","article-title":"A Novel Transformer Based Semantic Segmentation Scheme for Fine\u2010Resolution Remote Sensing Images","volume":"19","author":"Wang L.","year":"2022","journal-title":"IEEE Geoscience and Remote Sensing Letters"}],"container-title":["IET Image Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70326","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/full-xml\/10.1049\/ipr2.70326","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70326","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T09:48:54Z","timestamp":1772963334000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/ipr2.70326"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":51,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1049\/ipr2.70326"],"URL":"https:\/\/doi.org\/10.1049\/ipr2.70326","archive":["Portico"],"relation":{},"ISSN":["1751-9659","1751-9667"],"issn-type":[{"value":"1751-9659","type":"print"},{"value":"1751-9667","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1]]},"assertion":[{"value":"2025-09-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70326"}}