Multiscale Global Context Network for Semantic Segmentation of High-Resolution Remote Sensing Images,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multiscale Global Context Network for Semantic Segmentation of High-Resolution Remote Sensing Images
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2024-05-14 , DOI: 10.1109/tgrs.2024.3393489
Qiaolin Zeng ₁ , Jingxiang Zhou ₁ , Jinhua Tao ₂ , Liangfu Chen ₂ , Xuerui Niu ₁ , Yumeng Zhang ₃

Affiliation

Semantic segmentation of high-resolution remote sensing images (HRSIs) is a challenging task because objects in HRSIs usually have great scale variance and appearance variance. Although deep convolutional neural networks (DCNNs) have been widely applied in the semantic segmentation of HRSIs, they have inherent limitations in capturing global context. Attention mechanisms and transformer can effectively model long-range dependencies, but they often result in high computational costs when being applied to process HRSIs. In this article, an encoder-decoder network (MSGCNet) is proposed to fully and efficiently model multiscale context and long-range dependencies of HRSIs. Specifically, the multiscale interaction (MSI) module employs an efficient cross-attention to facilitate interaction among multiscale features of the encoder, which bridges the semantic gap between high- and low-level features and introduces more scale information to the network. In order to efficiently model long-range dependencies in both spatial and channel dimensions, the transformer-based decoder block (TBDB) implements window-based efficient multihead self-attention (W-EMSA) and enables interactions cross windows. Furthermore, to further integrate the global context generated by TBDB, the scale-aware fusion (SAF) module is proposed to deeply supervise the decoder, which iteratively fuses hierarchical features through spatial attention. As demonstrated by both quantitative and qualitative experimental results on two publicly available datasets, the proposed MSGCNet exhibits superior performance compared to currently popular methods. The code will be available at http://github.com/JingxiangZhou/MSGCNet .

中文翻译：

用于高分辨率遥感图像语义分割的多尺度全局上下文网络

高分辨率遥感图像（HRSI）的语义分割是一项具有挑战性的任务，因为HRSI中的对象通常具有很大的尺度方差和外观方差。尽管深度卷积神经网络（DCNN）已广泛应用于 HRSI 的语义分割，但它们在捕获全局上下文方面存在固有的局限性。注意力机制和 Transformer 可以有效地对远程依赖关系进行建模，但在应用于处理 HRSI 时通常会导致较高的计算成本。在本文中，提出了一种编码器-解码器网络（MSGCNet）来全面有效地对 HRSI 的多尺度上下文和远程依赖性进行建模。具体来说，多尺度交互（MSI）模块采用有效的交叉注意来促进编码器的多尺度特征之间的交互，从而弥合了高层和低层特征之间的语义差距，并向网络引入了更多尺度信息。为了有效地对空间和通道维度上的远程依赖性进行建模，基于变压器的解码器块（TBDB）实现了基于窗口的高效多头自注意力（W-EMSA）并实现了跨窗口的交互。此外，为了进一步整合 TBDB 生成的全局上下文，提出了尺度感知融合（SAF）模块来深度监督解码器，解码器通过空间注意力迭代地融合层次特征。两个公开数据集的定量和定性实验结果表明，与当前流行的方法相比，所提出的 MSGCNet 表现出优越的性能。该代码将在以下位置提供：http://github.com/JingxiangZhou/MSGCNet 。

更新日期：2024-05-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>