当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2024-04-30 , DOI: 10.1109/tip.2024.3393365
Liuxin Bao 1 , Xiaofei Zhou 1 , Xiankai Lu 2 , Yaoqi Sun 3 , Haibing Yin 3 , Zhenghui Hu 4 , Jiyong Zhang 1 , Chenggang Yan 3
Affiliation  

Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet .

中文翻译:

用于 VDT 显着目标检测的质量感知选择性融合网络

深度图像和热图像包含空间几何信息和表面温度信息,可以作为RGB模态的补充信息。然而,在一些具有挑战性的场景中,深度图像和热图像的质量通常不可靠,这将导致基于双模态的显着目标检测(SOD)的性能下降。同时,一些研究人员关注三模态 SOD 任务,即可见光深度热(VDT)SOD,他们试图探索 RGB 图像、深度图像和热图像的互补性。然而,现有的三模态 SOD 方法无法感知深度图和热图像的质量,这导致在处理低质量深度和热图像的场景时性能下降。因此,在本文中,我们提出了一种质量感知选择性融合网络(QSF-Net)来进行VDT显着目标检测,其中包含三个子网,包括初始特征提取子网、质量感知区域选择子网和区域-引导选择性融合子网。首先,除了提取特征之外,初始特征提取子网还可以通过配备多尺度融合(MSF)模块的收缩金字塔架构从每种模态生成初步预测图。然后,我们设计弱监督质量感知区域选择子网来生成质量感知图。具体来说,我们首先使用初步预测找到高质量和低质量区域,这进一步构成了可用于训练该子网的伪标签。最后,区域引导选择性融合子网在质量感知图的指导下纯化初始特征,然后通过模态内和模间注意融合三模态特征并细化预测图的边缘细节( IIA)模块和边缘细化(ER)模块。在 VDT-2048 数据集上进行了大量实验,结果表明我们的显着性模型始终大幅优于 13 种最先进的方法。我们的代码和结果可在https://github.com/Lx-Bao/QSFNet
更新日期:2024-04-30
down
wechat
bug