当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of Novel Methods for QSAR Modeling by Machine Learning Repeatedly: A Case Study on Drug Distribution to Each Tissue
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-04-19 , DOI: 10.1021/acs.jcim.4c00046
Koichi Handa 1 , Saki Yoshimura 1 , Michiharu Kageyama 1 , Takeshi Iijima 1
Affiliation  

Artificial intelligence is expected to help identify excellent candidates in drug discovery. However, we face a lack of data, as it is time-consuming and expensive to acquire raw data perfectly for many compounds. Hence, we tried to develop a novel quantitative structure-activity relationship (QSAR) method to predict a parameter more precisely from an incomplete data set via optimizing data handling by making use of predicted explanatory variables. As a case study we focused on the tissue-to-plasma partition coefficient (Kp), which is an important parameter for understanding drug distribution in tissues and building the physiologically based pharmacokinetic model and is a representative of small and sparse data sets. In this study, we predicted the Kp values of 119 compounds in nine tissues (adipose, brain, gut, heart, kidney, liver, lung, muscle, and skin), although some of these were not available. To fill the missing values in Kp for each tissue, first we predicted those Kp values by the nonmissing data set using a random forest (RF) model with in vitro parameters (log P, fu, Drug Class, and fi) like a classical prediction by a QSAR model. Next, to predict the tissue-specific Kp values in a test data set, we constructed a second RF model with not only in vitro parameters but also the Kp values of other tissues (i.e., other than target tissues) predicted by the first RF model as explanatory variables. Furthermore, we tested all possible combinations of explanatory variables and selected the model with the highest predictability from the test data set as the final model. The evaluation of Kp prediction accuracy based on the root-mean-square error and R2 value revealed that the proposed models outperformed other machine learning methods such as the conventional RF and message-passing neural networks. Significant improvements were observed in the Kp values of adipose tissue, brain, kidney, liver, and skin. These improvements indicated that the Kp information on other tissues can be used to predict the same for a specific tissue. Additionally, we found a novel relationship between each tissue by evaluating all combinations of explanatory variables. In conclusion, we developed a novel RF model to predict Kp values. We hope that this method will be applied to various problems in the field of experimental biology which often contains missing values in the near future.

中文翻译:

通过机器学习反复开发 QSAR 建模新方法:药物分布到每个组织的案例研究

人工智能有望帮助识别药物发现中的优秀候选者。然而,我们面临数据缺乏的问题,因为完美获取许多化合物的原始数据既耗时又昂贵。因此,我们尝试开发一种新颖的定量构效关系(QSAR)方法,通过利用预测的解释变量优化数据处理,从不完整的数据集中更精确地预测参数。作为一个案例研究,我们重点关注组织与血浆的分配系数(Kp),它是了解药物在组织中的分布和建立基于生理的药代动力学模型的重要参数,并且是小且稀疏数据集的代表。在这项研究中,我们预测了 9 种组织(脂肪、大脑、肠道、心脏、肾脏、肝脏、肺、肌肉和皮肤)中 119 种化合物的 Kp 值,尽管其中一些组织无法获得。为了填充每个组织的 Kp 缺失值,首先我们使用具有体外参数(log P、fu、药物类别和 fi)的随机森林 (RF) 模型,通过非缺失数据集预测这些 Kp 值,就像经典预测一样通过 QSAR 模型。接下来,为了预测测试数据集中的组织特异性 Kp 值,我们构建了第二个 RF 模型,不仅包含体外参数,还包含第一个 RF 模型预测的其他组织(即目标组织以外)的 Kp 值作为解释变量。此外,我们测试了解释变量的所有可能组合,并从测试数据集中选择了具有最高可预测性的模型作为最终模型。基于均方根误差和R 2值的Kp预测精度评估表明,所提出的模型优于其他机器学习方法,例如传统的RF和消息传递神经网络。脂肪组织、大脑、肾脏、肝脏和皮肤的 Kp 值显着改善。这些改进表明其他组织的 Kp 信息可用于预测特定组织的相同信息。此外,我们通过评估解释变量的所有组合,发现了每个组织之间的新颖关系。总之,我们开发了一种新颖的 RF 模型来预测 Kp 值。我们希望这种方法在不久的将来能够应用于实验生物学领域中经​​常含有缺失值的各种问题。
更新日期:2024-04-19
down
wechat
bug