MalGNE: Enhancing the Performance and Efficiency of CFG-Based Malware Detector by Graph Node Embedding in Low Dimension Space,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MalGNE: Enhancing the Performance and Efficiency of CFG-Based Malware Detector by Graph Node Embedding in Low Dimension Space
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2024-04-16 , DOI: 10.1109/tifs.2024.3389614
Hao Peng ₁ , Jieshuai Yang ₁ , Dandan Zhao ₁ , Xiaogang Xu ₁ , Yuwen Pu ₂ , Jianmin Han ₁ , Xing Yang ₃ , Ming Zhong ₁ , Shouling Ji ₂

Affiliation

The rich semantic information in Control Flow Graphs (CFGs) of executable programs has made Graph Neural Networks (GNNs) a key focus for malware detection. However, existing CFG-based detection techniques face limitations in node feature extraction, such as information loss, neglect of execution sequence information, and redundancy in representation vectors. These limitations compromise the balance between high efficiency and precision when training detectors. Addressing this, we introduce an innovative Malware CFG Node Embedding (MalGNE) method. This approach utilizes a novel instruction encoding rule to address the Out-Of-Vocabulary(OOV) problem, generates high-quality initial vectors. Then, it employs aggregation layer and sequence layer to extract node aggregation feature and execution sequence feature, in conjunction with GNNs to develop a pre-trained node embedding model. The model maps the semantic information of node assembly instruction sequences into a compact, low-dimensional continuous space, ensuring high-quality feature extraction, and enhancing the performance and efficiency of the detector. We trained the MalGNE model using the BIG 2015 dataset and validated MalGNE-enhanced detector on the SOREL-20M and BODMAS datasets. MalGNE-enhanced detector demonstrates outstanding performance and efficiency in low-dimensional spaces, especially when the dimensionality of the node feature vector is reduced to 16. MalGNE-enhanced detector not only maintains a high detection accuracy of 95.49%. sacrificing only about 1.7% of accuracy to save approximately 73% of training time compared to 128 dimensions.

中文翻译：

MalGNE：通过低维空间中的图节点嵌入提高基于 CFG 的恶意软件检测器的性能和效率

可执行程序的控制流图 (CFG) 中丰富的语义信息使图神经网络 (GNN) 成为恶意软件检测的关键焦点。然而，现有的基于CFG的检测技术在节点特征提取方面面临着局限性，例如信息丢失、执行序列信息的忽略以及表示向量的冗余。这些限制损害了训练探测器时高效率和精度之间的平衡。为了解决这个问题，我们引入了一种创新的恶意软件 CFG 节点嵌入 (MalGNE) 方法。该方法利用新颖的指令编码规则来解决词汇外（OOV）问题，生成高质量的初始向量。然后，利用聚合层和序列层提取节点聚合特征和执行序列特征，结合GNN开发预训练的节点嵌入模型。该模型将节点组装指令序列的语义信息映射到紧凑的低维连续空间中，保证高质量的特征提取，并提高检测器的性能和效率。我们使用 BIG 2015 数据集训练 MalGNE 模型，并在 SOREL-20M 和 BODMAS 数据集上验证 MalGNE 增强型检测器。 MalGNE-enhanced 检测器在低维空间中表现出出色的性能和效率，特别是当节点特征向量的维度降低到 16 时。MalGNE-enhanced 检测器不仅保持了 95.49% 的高检测精度。与 128 个维度相比，仅牺牲约 1.7% 的精度即可节省约 73% 的训练时间。

更新日期：2024-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>