当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2023-12-21 , DOI: 10.1186/s13321-023-00794-w
Benoit Baillif , Jason Cole , Ilenia Giangreco , Patrick McCabe , Andreas Bender

Identifying bioactive conformations of small molecules is an essential process for virtual screening applications relying on three-dimensional structure such as molecular docking. For most small molecules, conformer generators retrieve at least one bioactive-like conformation, with an atomic root-mean-square deviation (ARMSD) lower than 1 Å, among the set of low-energy conformers generated. However, there is currently no general method to prioritise these likely target-bound conformations in the ensemble. In this work, we trained atomistic neural networks (AtNNs) on 3D information of generated conformers of a curated subset of PDBbind ligands to predict the ARMSD to their closest bioactive conformation, and evaluated the early enrichment of bioactive-like conformations when ranking conformers by AtNN prediction. AtNN ranking was compared with bioactivity-unaware baselines such as ascending Sage force field energy ranking, and a slower bioactivity-based baseline ranking by ascending Torsion Fingerprint Deviation to the Maximum Common Substructure to the most similar molecule in the training set (TFD2SimRefMCS). On test sets from random ligand splits of PDBbind, ranking conformers using ComENet, the AtNN encoding the most 3D information, leads to early enrichment of bioactive-like conformations with a median BEDROC of 0.29 ± 0.02, outperforming the best bioactivity-unaware Sage energy ranking baseline (median BEDROC of 0.18 ± 0.02), and performing on a par with the bioactivity-based TFD2SimRefMCS baseline (median BEDROC of 0.31 ± 0.02). The improved performance of the AtNN and TFD2SimRefMCS baseline is mostly observed on test set ligands that bind proteins similar to proteins observed in the training set. On a more challenging subset of flexible molecules, the bioactivity-unaware baselines showed median BEDROCs up to 0.02, while AtNNs and TFD2SimRefMCS showed median BEDROCs between 0.09 and 0.13. When performing rigid ligand re-docking of PDBbind ligands with GOLD using the 1% top-ranked conformers, ComENet ranked conformers showed a higher successful docking rate than bioactivity-unaware baselines, with a rate of 0.48 ± 0.02 compared to CSD probability baseline with a rate of 0.39 ± 0.02. Similarly, on a pharmacophore searching experiment, selecting the 20% top-ranked conformers ranked by ComENet showed higher hit rate compared to baselines. Hence, the approach presented here uses AtNNs successfully to focus conformer ensembles towards bioactive-like conformations, representing an opportunity to reduce computational expense in virtual screening applications on known targets that require input conformations.

中文翻译:

应用原子神经网络使构象异构体整体偏向生物活性样构象

识别小分子的生物活性构象是依赖于三维结构(例如分子对接)的虚拟筛选应用的重要过程。对于大多数小分子,构象异构体生成器在生成的一组低能构象异构体中检索至少一种类似生物活性的构象,其原子均方根偏差 (ARMSD) 低于 1 Å。然而,目前还没有通用的方法来对集合中这些可能的目标结合构象进行优先级排序。在这项工作中,我们根据 PDBbind 配体精选子集生成的构象异构体的 3D 信息训练原子神经网络 (AtNN),以将 ARMSD 预测为其最接近的生物活性构象,并在通过 AtNN 对构象异构体进行排序时评估类生物活性构象的早期富集情况预言。AtNN 排名与生物活性无感知基线进行比较,例如上升的 Sage 力场能量排名,以及通过将扭转指纹偏差上升到训练集中最相似分子的最大公共子结构 (TFD2SimRefMCS) 的基于较慢生物活性的基线排名。在 PDBbind 随机配体分裂的测试集上,使用 ComENet(编码最多 3D 信息的 AtNN)对构象异构体进行排名,导致生物活性样构象的早期富集,中位 BEDROC 为 0.29 ± 0.02,优于最佳的生物活性未知 Sage 能量排名基线(中位 BEDROC 为 0.18 ± 0.02),并且与基于生物活性的 TFD2SimRefMCS 基线(中位 BEDROC 为 0.31 ± 0.02)表现相当。AtNN 和 TFD2SimRefMCS 基线性能的改进主要是在测试集配体上观察到的,这些配体结合的蛋白质与训练集中观察到的蛋白质相似。在更具挑战性的柔性分子子集上,生物活性未知基线显示中位 BEDROC 高达 0.02,而 AtNNs 和 TFD2SimRefMCS 显示中位 BEDROC 介于 0.09 和 0.13 之间。当使用 1% 排名最高的构象异构体对 PDBbind 配体与 GOLD 进行刚性配体重新对接时,ComENet 排名的构象异构体显示出比生物活性未知基线更高的成功对接率,与 CSD 概率基线相比,对接成功率为 0.48 ± 0.02率为0.39±0.02。同样,在药效团搜索实验中,选择 ComENet 排名最高的 20% 构象异构体与基线相比显示出更高的命中率。因此,这里提出的方法使用 AtNN 成功地将构象异构体集合聚焦于类似生物活性的构象,这代表了减少对需要输入构象的已知目标的虚拟筛选应用中的计算费用的机会。
更新日期:2023-12-22
down
wechat
bug