当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TA-YOLO: a lightweight small object detection model based on multi-dimensional trans-attention module for remote sensing images
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2024-05-08 , DOI: 10.1007/s40747-024-01448-6
Minze Li , Yuling Chen , Tao Zhang , Wu Huang

Object detection plays a vital role in remote sensing applications. Although object detection has achieved proud results in natural images, these methods are difficult to be directly applied to remote sensing images. Remote sensing images often have complex backgrounds and small objects, which results in a highly unbalanced distribution of foreground and complex background information. In order to solve the above problems, this paper proposes a multi-head channel and spatial trans-attention (MCSTA) module, which performs remote pixel interaction from the channel and spatial dimensions respectively to complete the attention feature capture function. It is a plug-and-play module that can be easily embedded in any other natural image object detection convolutional neural network, making it quickly applicable to remote sensing images. First, in order to reduce computational complexity and improve feature richness, we use a special linear convolution to obtain three projection features instead of the simple matrix multiplication transformation in Transformer. Second, we obtain trans-attention maps in different dimensions in a manner similar to the self-attention mechanism to capture the interrelationships of features in channels and spaces. In this process, we use a multi-head mechanism to perform parallel operations to improve speed. Furthermore, in order to avoid large-scale matrix operations, we specially designed an attention blocking mode to reduce computer memory usage and increase operation speed. Finally, we embedded the trans-attention module into YOLOv8, added a new detection head and optimized the feature fusion method, thus designing a lightweight small object detection model named TA-YOLO for remote sensing images. It has fewer parameters than the benchmark model YOLOv8, and its mAP on the PASCAL VOC and VisDrone data sets increased by 1.3% and 6.2% respectively. The experimental results prove the powerful function of the trans-attention module and the excellent performance of TA-YOLO.



中文翻译:

TA-YOLO:基于多维反注意力模块的遥感图像轻量级小物体检测模型

物体检测在遥感应用中起着至关重要的作用。尽管目标检测在自然图像中取得了令人自豪的结果,但这些方法很难直接应用于遥感图像。遥感图像往往具有复杂的背景和微小的物体,导致前景和复杂背景信息的分布高度不平衡。为了解决上述问题,本文提出了多头通道和空间反式注意力(MCSTA)模块,分别从通道和空间维度进行远程像素交互,完成注意力特征捕获功能。它是一个即插即用的模块,可以轻松嵌入到任何其他自然图像对象检测卷积神经网络中,使其快速适用于遥感图像。首先,为了降低计算复杂度并提高特征丰富度,我们使用特殊的线性卷积来获得三个投影特征,而不是Transformer中简单的矩阵乘法变换。其次,我们以类似于自注意力机制的方式获得不同维度的跨注意力图,以捕获通道和空间中特征的相互关系。在这个过程中,我们采用多头机制来进行并行操作,以提高速度。此外,为了避免大规模矩阵运算,我们专门设计了注意力阻塞模式,以减少计算机内存占用并提高运算速度。最后,我们将反式注意力模块嵌入到YOLOv8中,添加新的检测头并优化特征融合方法,从而设计了一种轻量级的遥感图像小目标检测模型TA-YOLO。它的参数比基准模型YOLOv8少,在PASCAL VOC和VisDrone数据集上的mAP分别提高了1.3%和6.2%。实验结果证明了反式注意力模块的强大功能和TA-YOLO的优异性能。

更新日期:2024-05-09
down
wechat
bug