当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2024-04-30 , DOI: 10.1109/tip.2024.3393298
Huijie Fan 1 , Zhencheng Yu 1 , Qiang Wang 2 , Baojie Fan 3 , Yandong Tang 1
Affiliation  

Existing RGB-Thermal trackers usually treat intra-modal feature extraction and inter-modal feature fusion as two separate processes, therefore the mutual promotion of extraction and fusion is neglected. Then, the complementary advantages of RGB-T fusion are not fully exploited, and the independent feature extraction is not adaptive to modal quality fluctuation during tracking. To address the limitations, we design a joint-modality query fusion network, in which the intra-modal feature extraction and the inter-modal fusion are coupled together and promote each other via joint-modality queries. The queries are initialized based on the multimodal features of the current frame, making the subsequent fusion adaptive to modal quality fluctuation during tracking. Then the joint-modality query fusion (JQF) utilizes the queries to interact with RGB-T features, allowing the intra-modal enhancement and the inter-modal interactions to be unified for mutual promotion. In this way, JQF can distinguish and enhance the complementary modality features, while filtering out redundant information. For real-time tracking, we propose regional cross-attention for cross-modal interactions to reduce computational cost. Our end-to-end tracker sets a new state-of-the-art performance on multiple RGBT tracking benchmarks including LasHeR, VTUAV, RGBT234 and GTOT, while running at a real-time speed.

中文翻译:

QueryTrack:用于 RGBT 跟踪的联合模态查询融合网络

现有的RGB-Thermal跟踪器通常将模态内特征提取和模态间特征融合视为两个独立的过程,因此忽略了提取和融合的相互促进。然后,RGB-T融合的互补优势没有得到充分发挥,独立的特征提取不能适应跟踪过程中的模态质量波动。为了解决这些限制,我们设计了一种联合模态查询融合网络,其中模态内特征提取和模间融合耦合在一起,并通过联合模态查询相互促进。查询基于当前帧的多模态特征进行初始化,使得后续融合能够适应跟踪期间的模态质量波动。然后,联合模态查询融合(JQF)利用查询与RGB-T特征进行交互,使模态内增强和模态间交互统一起来,相互促进。这样,JQF就可以区分和增强互补的模态特征,同时过滤掉冗余信息。对于实时跟踪,我们提出跨模式交互的区域交叉注意力,以降低计算成本。我们的端到端跟踪器在多个 RGBT 跟踪基准(包括 LasHeR、VTUAV、RGBT234 和 GTOT)上设置了新的最先进性能,同时以实时速度运行。
更新日期:2024-04-30
down
wechat
bug