ACMFNet: Attention-Based Cross-Modal Fusion Network for Building Extraction of Remote Sensing Images,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ACMFNet: Attention-Based Cross-Modal Fusion Network for Building Extraction of Remote Sensing Images
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2024-05-14 , DOI: 10.1109/tgrs.2024.3400979
Baiyu Chen ₁ , Zongxu Pan ₁ , Jianwei Yang ₁ , Hui Long ₁

Affiliation

In recent years, significant progress has been made in extracting buildings from high spatial resolution (HSR) remote sensing images due to the rapid development of deep learning (DL). However, the existing methods still have some limitations in maintaining the detail integrity of building footprint. First, skip connections typically involve the direct concatenation of feature maps from adjacent levels, which inevitably leads to misalignment due to semantic differences. Second, the integration of building-related details remains a challenging task in the context of cross-modal remote sensing image. Third, the oversimplified upsampling structure used in previous methods may lead to loss of spatial details. In this article, we propose a novel building extraction method attention-based cross-modal fusion network (ACMFNet) based on cross-modal HSR remote sensing images using an encoder–decoder structure. First, we propose a global and local feature refinement module (GL-FRM) to refine features and establish contextual dependencies at multiple scales and levels, mitigating the spatial discrepancy among multilevel features. Meanwhile, a cross-modal fusion module is utilized to integrate complementary features extracted from multispectral (MS) data and normalized digital surface model (nDSM) data. In addition, we employed a lightweight residual upsampling module (RUM) for feature resolution recovery. We conducted complete experiments on two benchmark datasets, and the results indicate that our proposed ACMFNet achieves state-of-the-art (SOTA) performance without bells and whistles.

中文翻译：

ACMFNet：基于注意力的跨模态融合网络，用于遥感图像的构建提取

近年来，由于深度学习（DL）的快速发展，在从高空间分辨率（HSR）遥感图像中提取建筑物方面取得了重大进展。然而，现有方法在保持建筑足迹的细节完整性方面仍然存在一些局限性。首先，跳跃连接通常涉及相邻级别的特征图的直接串联，这不可避免地导致由于语义差异而导致的错位。其次，在跨模态遥感图像的背景下，建筑相关细节的整合仍然是一项具有挑战性的任务。第三，先前方法中使用的过于简化的上采样结构可能会导致空间细节的丢失。在本文中，我们提出了一种基于跨模态 HSR 遥感图像、使用编码器-解码器结构的新型建筑物提取方法、基于注意力的跨模态融合网络（ACMFNet）。首先，我们提出了一个全局和局部特征细化模块（GL-FRM）来细化特征并在多个尺度和级别上建立上下文依赖关系，从而减轻多级特征之间的空间差异。同时，利用跨模态融合模块集成从多光谱（MS）数据和归一化数字表面模型（nDSM）数据中提取的互补特征。此外，我们采用了轻量级残差上采样模块（RUM）来恢复特征分辨率。我们在两个基准数据集上进行了完整的实验，结果表明我们提出的 ACMFNet 实现了最先进的 (SOTA) 性能，没有任何附加功能。

更新日期：2024-05-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>