NeuralSanitizer: Detecting Backdoors in Neural Networks,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NeuralSanitizer: Detecting Backdoors in Neural Networks
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2024-04-17 , DOI: 10.1109/tifs.2024.3390599
Hong Zhu ₁ , Yue Zhao ₁ , Shengzhi Zhang ₂ , Kai Chen ₁

Affiliation

Deep neural networks (DNNs) have been pervasively used in many areas, e.g., computer vision, speech recognition, natural language processing, etc. However, recent works show that they are vulnerable to backdoor/Trojan attacks, severely restricting their usage in various scenarios. In this paper, we propose NeuralSanitizer, a novel approach to detect and remove backdoors in DNNs, capable of capturing various triggers with better accuracy and higher efficiency. In particular, we identify two fundamental properties of triggers, i.e., their effectiveness in the backdoored model and ineffectiveness in other clean models, and design a novel objective function to reconstruct triggers based on them. Then we present a new approach that leverages transferability to identify adversarial patches that could be generated during trigger reconstruction, thus detecting backdoors more accurately. We evaluate NeuralSanitizer on real-world backdoored DNNs and achieve 2.1% FNR and 0.9% FPR on average, significantly outperforming the state-of-the-art works by 1~14 times. In addition, NeuralSanitizer can reconstruct triggers up to 25% of the size of the original inputs on average, compared to only 6~10% by existing works. Finally, NeuralSanitizer is also 1~25 times faster than existing works.

中文翻译：

NeuralSanitizer：检测神经网络中的后门

深度神经网络（DNN）已广泛应用于计算机视觉、语音识别、自然语言处理等许多领域。然而，最近的研究表明它们容易受到后门/木马攻击，严重限制了它们在各种场景中的使用。在本文中，我们提出了 NeuralSanitizer，这是一种检测和删除 DNN 中后门的新方法，能够以更好的精度和更高的效率捕获各种触发器。特别是，我们确定了触发器的两个基本属性，即它们在后门模型中的有效性和在其他干净模型中的无效性，并设计了一种新颖的目标函数来基于它们重建触发器。然后，我们提出了一种新方法，利用可转移性来识别触发重建期间可能生成的对抗性补丁，从而更准确地检测后门。我们在现实世界的后门 DNN 上评估 NeuralSanitizer，平均达到 2.1% FNR 和 0.9% FPR，明显优于最先进的作品 1~14 倍。此外，NeuralSanitizer 平均可以重建高达原始输入大小 25% 的触发器，而现有作品只能重建 6%~10% 的触发器。最后，NeuralSanitizer 也比现有作品快 1~25 倍。

更新日期：2024-04-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>