Fine-Grained Recognition With Learnable Semantic Data Augmentation,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fine-Grained Recognition With Learnable Semantic Data Augmentation
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2024-04-25 , DOI: 10.1109/tip.2024.3364500
Yifan Pu ₁ , Yizeng Han ₁ , Yulin Wang ₁ , Junlan Feng ₂ , Chao Deng ₂ , Gao Huang ₁

Affiliation

Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA .

中文翻译：

通过可学习语义数据增强进行细粒度识别

细粒度图像识别是一项长期存在的计算机视觉挑战，其重点是区分属于同一元类别内多个从属类别的对象。由于属于同一元类别的图像通常具有相似的视觉外观，因此挖掘有区别的视觉线索是区分细粒度类别的关键。虽然常用图像级数据增强技术在通用图像分类问题上取得了巨大成功，但它们很少应用于细粒度场景，因为它们随机编辑区域行为很容易破坏微妙区域中的辨别性视觉线索。在本文中，我们建议使训练数据多样化特征级别来缓解判别性区域丢失问题。具体来说，我们通过沿着语义有意义的方向翻译图像特征来生成多样化的增强样本。使用协方差预测网络来估计语义方向，该网络预测样本协方差矩阵以适应细粒度图像中固有的大的类内变化。此外，协方差预测网络以元学习的方式与分类网络联合优化，以缓解退化解问题。在四个竞争性细粒度识别基准（CUB-200-2011、Stanford Cars、FGVC Aircrafts、NABirds）上进行的实验表明，我们的方法显着提高了几种流行分类网络（例如 ResNets、DenseNets、EfficientNets、RegNets 和 ViT）的泛化性能）。结合最近提出的方法，我们的语义数据增强方法在 CUB-200-2011 数据集上实现了最先进的性能。源代码位于https://github.com/LeapLabTHU/LearnableISDA 。

更新日期：2024-04-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>