BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2024-04-08 , DOI: 10.1109/tifs.2024.3386058
Jordan Vice ₁ , Naveed Akhtar ₂ , Richard Hartley ₃ , Ajmal Mian ₁

Affiliation

The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative metrics designed specifically for assessing the effectiveness of backdoor attacks on text-to-image models. The efficacy of BAGM is established by attacking state-of-the-art generative models, using a marketing scenario as the target domain. To that end, we contribute a dataset of branded product images. Our embedded backdoors increase the bias towards the target outputs by more than five times the usual, without compromising the model robustness or the generated content utility. By exposing generative AI’s vulnerabilities, we encourage researchers to tackle these challenges and practitioners to exercise caution when using pre-trained models. Relevant code and input prompts can be found at https://github.com/JJ-Vice/BAGM , and the dataset is available at: https://ieee-dataport.org/documents/marketable-foods-mf-dataset

中文翻译：

BAGM：操纵文本到图像生成模型的后门攻击

文本到图像生成人工智能（AI）的普及引起了公众的广泛兴趣。我们证明，这项技术可能会受到攻击而生成巧妙地操纵用户的内容。我们提出了对文本到图像生成模型（BAGM）的后门攻击，一旦触发，就会将自然混合在内容中的操纵细节注入生成的图像中。我们的攻击首先通过修改嵌入式分词器、语言模型或图像生成模型的行为，针对生成过程的三个阶段中的三种流行的文本到图像生成模型。根据渗透级别，BAGM 采用一系列攻击的形式，在本文中称为表面攻击、浅层攻击和深层攻击。鉴于该领域内现有的差距，我们还提供了一套全面的定量指标，专门用于评估文本到图像模型后门攻击的有效性。 BAGM 的功效是通过使用营销场景作为目标域来攻击最先进的生成模型而建立的。为此，我们提供了品牌产品图像数据集。我们的嵌入式后门将目标输出的偏差增加了五倍以上，而不会影响模型的稳健性或生成的内容效用。通过揭露生成式人工智能的漏洞，我们鼓励研究人员应对这些挑战，并鼓励从业者在使用预先训练的模型时谨慎行事。相关代码和输入提示可以在https://github.com/JJ-Vice/BAGM ，数据集可在以下位置获取：https://ieee-dataport.org/documents/marketable-foods-mf-dataset

更新日期：2024-04-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>