Consistency-constrained RGB-T crowd counting via mutual information maximization,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Consistency-constrained RGB-T crowd counting via mutual information maximization
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2024-04-15 , DOI: 10.1007/s40747-024-01427-x
Qiang Guo , Pengcheng Yuan , Xiangming Huang , Yangdong Ye

The incorporation of thermal imaging data in RGB-T images has demonstrated its usefulness in cross-modal crowd counting by offering complementary information to RGB representations. Despite achieving satisfactory results in RGB-T crowd counting, many existing methods still face two significant limitations: (1) The oversight of the heterogeneous gap between modalities complicates the effective integration of multimodal features. (2) The absence of mining consistency hinders the full exploitation of the unique complementary strengths inherent in each modality. To this end, we present C4-MIM, a novel Consistency-constrained RGB-T Crowd Counting approach via Mutual Information Maximization. It effectively leverages multimodal information by learning the consistency between the RGB and thermal modalities, thereby enhancing the performance of cross-modal counting. Specifically, we first advocate extracting feature representations of different modalities in a shared encoder to moderate the heterogeneous gap since they obey the identical coding rules with shared parameters. Then, we intend to mine the consistent information of different modalities to better learn conducive information and improve the performance of feature representations. To this end, we formulate the complementarity of multimodality representations as a mutual information maximization regularizer to maximize the consistent information of different modalities, in which the consistency would be maximally attained before combining the multimodal information. Finally, we simply aggregate the feature representations of the different modalities and send them into a regressor to output the density maps. The proposed approach can be implemented by arbitrary backbone networks and is quite robust in the face of single modality unavailable or serious compromised. Extensively experiments have been conducted on the RGBT-CC and DroneRGBT benchmarks to evaluate the effectiveness and robustness of the proposed approach, demonstrating its superior performance compared to the SOTA approaches.

中文翻译：

通过互信息最大化的一致性约束 RGB-T 人群计数

RGB-T 图像中热成像数据的结合通过为 RGB 表示提供补充信息，证明了其在跨模式人群计数中的有用性。尽管在 RGB-T 人群计数中取得了令人满意的结果，但许多现有方法仍然面临两个重大局限性：（1）对模态之间异构差距的监督使多模态特征的有效集成变得复杂。 (2) 挖掘一致性的缺乏阻碍了每种模式固有的独特互补优势的充分利用。为此，我们提出了 C4-MIM，这是一种通过互信息最大化的新颖的一致性约束 RGB-T 人群计数方法。它通过学习 RGB 和热模态之间的一致性，有效地利用多模态信息，从而提高跨模态计数的性能。具体来说，我们首先主张在共享编码器中提取不同模态的特征表示以调节异构差距，因为它们遵循具有共享参数的相同编码规则。然后，我们打算挖掘不同模态的一致信息，以更好地学习有益信息并提高特征表示的性能。为此，我们将多模态表示的互补性制定为互信息最大化正则化器，以最大化不同模态的一致信息，其中在组合多模态信息之前最大化地获得一致性。最后，我们简单地聚合不同模态的特征表示并将它们发送到回归器中以输出密度图。所提出的方法可以通过任意骨干网络来实现，并且在面对单一模式不可用或严重受损的情况下非常稳健。在 RGBT-CC 和 DroneRGBT 基准上进行了广泛的实验，以评估所提出方法的有效性和鲁棒性，证明了其与 SOTA 方法相比的优越性能。

更新日期：2024-04-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>