Textual adversarial attacks by exchanging text-self words,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Textual adversarial attacks by exchanging text-self words
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2022-09-21 , DOI: 10.1002/int.23083
Huijun Liu ₁ , Jie Yu ₁ , Jun Ma ₁ , Shasha Li ₁ , Bin Ji ₁ , Zibo Yi ₂ , Miaomiao Li ₁ , Long Peng ₁ , Xiaodong Liu ₁

Affiliation

Adversarial attacks expose the vulnerability of deep neural networks. Compared to image adversarial attacks, textual adversarial attacks are more challenging due to the discrete nature of texts. Recent synonym-based methods achieve the current state-of-the-art results. However, these methods introduce new words against the original text, leading to that humans easily perceive the difference between the adversarial example and the original text. Motivated by the fact that humans are usually unaware of chaotic word order in some cases, we propose exchange-attack (EA), a concise and effective word-level textual adversarial attack model. Specifically, the EA model generates adversarial examples by exchanging words of the original text itself according to the contributions that these words make regarding classification results. Intuitively, the smaller the distance between the two exchanged words, the more difficult the chaotic word order to be perceived by humans. We thus take the word distance into consideration when generating the chaotic word orders. Extensive experiments on several text classification data sets show that the EA model consistently outperforms the selected baselines in terms of averaged after-attack accuracy, modification rate, query number, and semantic similarity. And human evaluation results reveal that humans difficultly perceive the adversarial examples generated by the EA model. In addition, quantitative and qualitative analyses further validate the effectiveness of the EA model, including that the generated adversarial examples are grammatically correct and semantically preserved.

中文翻译：

通过交换文本自词进行文本对抗攻击

对抗性攻击暴露了深度神经网络的脆弱性。与图像对抗性攻击相比，文本对抗性攻击由于文本的离散性更具挑战性。最近基于同义词的方法实现了当前最先进的结果。然而，这些方法针对原始文本引入了新词，导致人类很容易感知到对抗样本与原始文本之间的差异。由于人类在某些情况下通常不知道混乱的词序这一事实，我们提出了交换攻击（EA），这是一种简洁有效的词级文本对抗性攻击模型。具体来说，EA 模型根据这些词对分类结果的贡献，通过交换原始文本本身的词来生成对抗性示例。直觉上，两个交换词之间的距离越小，混乱的词序越难被人类感知。因此，我们在生成混沌词序时考虑了词距离。对几个文本分类数据集的大量实验表明，EA 模型在平均攻击后准确率、修改率、查询数量和语义相似性方面始终优于选定的基线。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。混乱的词序越难被人类感知。因此，我们在生成混沌词序时考虑了词距离。对几个文本分类数据集的大量实验表明，EA 模型在平均攻击后准确率、修改率、查询数量和语义相似性方面始终优于选定的基线。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。混乱的词序越难被人类感知。因此，我们在生成混沌词序时考虑了词距离。对几个文本分类数据集的大量实验表明，EA 模型在平均攻击后准确率、修改率、查询数量和语义相似性方面始终优于选定的基线。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。对几个文本分类数据集的大量实验表明，EA 模型在平均攻击后准确率、修改率、查询数量和语义相似性方面始终优于选定的基线。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。对几个文本分类数据集的大量实验表明，EA 模型在平均攻击后准确率、修改率、查询数量和语义相似性方面始终优于选定的基线。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。人类评估结果表明，人类很难感知 EA 模型生成的对抗样本。此外，定量和定性分析进一步验证了 EA 模型的有效性，包括生成的对抗性示例在语法上是正确的并且在语义上得到了保留。

更新日期：2022-09-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>