当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of substances by health hazard using deep neural networks and molecular electron densities
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-04-16 , DOI: 10.1186/s13321-024-00835-y
Satnam Singh , Gina Zeh , Jessica Freiherr , Thilo Bauer , Isik Türkmen , Andreas T. Grasskamp

In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .

中文翻译:

使用深度神经网络和分子电子密度按健康危害对物质进行分类

在本文中,我们提出了一种方法,允许利用 3D 电子密度信息来训练深度神经网络管道,以分割高、中和低电负性区域,并将物质分类为对健康有害或无害。我们证明这可以用于化妆品和食品等用例。为此,我们首先使用半经验分子计算为欧洲化学品管理局 (ECHA) 自定义子集生成 3D 电子密度立方体,该子集由标记为化妆品用途有害和无害的物质组成。与他们的 3 类电负性图一起,我们训练了一个带有电子密度立方体的改进的 3D-UNet,以分割分子中的反应位点并对物质进行分类,准确度为 78.1%。我们对自定义食品数据集 (CompFood) 执行相同的过程,该数据集由欧洲食品安全局 (EFSA) OpenFoodTox、美国食品药品监督管理局 (FDA) 公认安全 (GRAS) 和 FooDB 数据集组成,其中包含有害和无害物质分类准确率达到64.1%。我们的结果表明,通过原始电子密度与高电负性和低电负性区域的乘积计算得出的 3D 电子密度,尤其是掩蔽电子密度,可用于对不同用例的分子进行分类,因此不仅可以指导安全-设计产品开发,同时也协助监管决策。我们的目标是通过证明深度学习网络可以在分子的 3D 电子密度表示上进行训练,从而为用于训练机器学习算法的多样化 3D 分子表示做出贡献。这种方法以前没有被用来训练机器学习模型,它允许利用分子的真实空间域来预测特性,例如它们是否适合在化妆品和食品中使用,以及将来是否适合其他分子特性。用于训练的数据和代码可在 https://github.com/s-singh-ivv/eDen-Substances 访问。
更新日期:2024-04-16
down
wechat
bug