当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining transformer global and local feature extraction for object detection
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2024-04-15 , DOI: 10.1007/s40747-024-01409-z
Tianping Li , Zhenyi Zhang , Mengdi Zhu , Zhaotong Cui , Dongmei Wei

Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007 + 2012 datasets, respectively, and the experimental results validate the effectiveness of our method.



中文翻译:

结合 Transformer 全局和局部特征提取进行目标检测

基于卷积神经网络(CNN)的目标检测器表现出色,但缺乏全局特征提取,并且无法建立目标像素之间的全局依赖关系。虽然Transformer能够弥补这一点,但它没有融合卷积的优点,导致无法获得局部特征细节信息不足,且速度慢、计算参数大。此外,特征金字塔网络(FPN)缺乏跨层的信息交互,这会减少特征上下文信息的获取。为了解决上述问题,本文提出了一种基于CNN的无锚目标检测器,结合变压器全局和局部特征提取(GLFT)来增强从图像中提取语义信息。首先,设计了分段通道提取特征注意(SCEFA)模块,以改进模型中局部多尺度通道特征的提取,并增强对目标区域中像素的辨别力。其次,设计了与卷积相结合的聚合特征混合变换器(AFHTrans)模块,以增强从模型中提取全局和局部特征信息,并建立远处物体像素的依赖性。该方法通过多层信息聚合传输的方式弥补了FPN的缺点。与变压器相比,这些方法具有明显的优势。最后,设计了特征提取头(FE-Head),根据不同任务的特征提取全文信息。在COCO2017和PASCAL VOC2007 + 2012数据集上分别达到了47.0%和82.76%的准确率,实验结果验证了我们方法的有效性。

更新日期:2024-04-15
down
wechat
bug