当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2024-05-07 , DOI: 10.1109/tip.2024.3396063
Haiwen Diao 1 , Ying Zhang 2 , Shang Gao 1 , Xiang Ruan 3 , Huchuan Lu 1
Affiliation  

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular network and data distribution. Building upon this knowledge, a target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples. Extensive experiments validate that our DBL can achieve impressive and consistent improvements based on various recent state-of-the-art models in the image-text matching field, and outperform related popular cooperative strategies, e.g., Conventional Distillation, Mutual Learning, and Contrastive Learning. Beyond the above, we confirm that DBL can be seamlessly integrated into their training scenarios and achieve superior performance under the same computational costs, demonstrating the flexibility and broad applicability of our proposed method.

中文翻译:

深度提升学习:一种全新的图文匹配协作方法

由于跨模态的异构语义多样性以及三元组内的距离可分离性不足,图像文本匹配仍然是一项具有挑战性的任务。与以前专注于增强多模态表示或利用跨模态对应来进行更准确检索的方法不同,本文的目标是以促进的方式利用对等分支之间的知识转移来寻求更强大的匹配模型。具体来说,我们提出了一种全新的深度提升学习(DBL)算法,其中首先训练锚分支以提供对数据属性的洞察,目标分支获得更高级的知识以开发最佳特征和距离度量。具体来说,锚分支最初学习正负对之间的绝对或相对距离,提供对特定网络和数据分布的基础理解。基于这些知识,目标分支同时承担更具适应性的边际约束的任务,以进一步扩大匹配和不匹配样本之间的相对距离。大量的实验验证了我们的 DBL 可以基于图像文本匹配领域中各种最新最先进的模型实现令人印象深刻且一致的改进,并且优于相关的流行合作策略,例如传统蒸馏、相互学习和对比学习。除此之外,我们确认 DBL 可以无缝集成到他们的训练场景中,并在相同的计算成本下实现卓越的性能,这证明了我们提出的方法的灵活性和广泛的适用性。
更新日期:2024-05-07
down
wechat
bug