Improving chemical reaction yield prediction using pre-trained graph neural networks,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving chemical reaction yield prediction using pre-trained graph neural networks
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-03-01 , DOI: 10.1186/s13321-024-00818-z
Jongmin Han , Youngchun Kwon , Youn-Suk Choi , Seokho Kang

Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.

中文翻译：

使用预训练的图神经网络改进化学反应产率预测

图神经网络（GNN）已被证明可以有效预测化学反应产率。然而，当使用数量或多样性不足的训练数据集进行训练时，它们的性能往往会恶化。缓解这个问题的一个有前途的解决方案是在大规模分子数据库上预训练 GNN。在本研究中，我们研究了 GNN 预训练在化学反应产率预测中的有效性。我们提出了一种新颖的 GNN 预训练方法来提高性能。给定一个由大量分子组成的分子数据库，我们计算每个分子的分子描述符，并通过应用主成分分析来降低这些描述符的维数。我们通过将主成分分数向量分配为数据库中每个分子的伪标签来定义前置任务。然后对 GNN 进行预训练，以执行预测输入分子的伪标签的前置任务。对于化学反应产量预测，使用预先训练的 GNN 初始化预测模型，然后使用包含化学反应及其产量的训练数据集进行微调。我们通过对基准数据集的实验评估证明了所提出方法的有效性。

更新日期：2024-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>