FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology,Journal of Chemical Information and Modeling

当前位置： X-MOL 学术 › J. Chem. Inf. Model. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-04-23 , DOI: 10.1021/acs.jcim.4c00071
Pieter B. Burger ₁ , Xiaohu Hu ₂ , Ilya Balabin ₁ , Morné Muller ₁ , Megan Stanley ₃ , Fourie Joubert ₄ , Thomas M. Kaiser ₁

Affiliation

In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist’s toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML’s capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve.

中文翻译：

FEP 增强作为解决化学生物学机器学习数据匮乏问题的一种方法

在药物化学领域，主要目标是快速优化一组化合物的多种化学性质，以产生可供临床试验的临床候选药物。近年来，机器学习 (ML) 和基于物理的方法这两种计算技术已经取得了长足的发展，现在经常被纳入药物化学家的工具箱中，以提高命中优化和候选设计的效率。这两种计算方法都有其自身的局限性，并且它们通常彼此独立使用。机器学习快速筛选大量化合物库的能力因其对质量数据的依赖而受到影响，而质量数据可能很少，尤其是在早期优化期间。相反，相比之下，自由能扰动 (FEP) 等基于物理的方法经常受到低吞吐量和高成本的限制；然而，基于物理的方法能够做出高度准确的结合亲和力预测。在这项研究中，我们利用 FEP 的优势，通过生成虚拟活动数据集来克服 ML 中的数据匮乏问题，然后为算法的训练提供信息。在这里，我们表明，使用 FEP 增强数据集训练的 ML 算法可以达到与使用生物测定实验数据训练的数据集相当的预测准确性。在整篇论文中，我们强调了在扩大数据集并为成功实施奠定基础时必须考虑的关键机制考虑因素。最终，该研究提倡基于物理的方法和机器学习的协同作用，以加快先导化合物优化过程。我们相信，随着这些技术的不断发展，基于物理的机器学习增强将显着有利于药物发现。

更新日期：2024-04-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>