Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2023-12-18 , DOI: 10.1186/s13321-023-00790-0
Maria H. Rasmussen , Chenru Duan , Heather J. Kulik , Jan H. Jensen

With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).

中文翻译：

不确定性的不确定性？化学数据集不确定性量化指标的比较

随着机器学习 (ML) 模型在化学研究中的作用日益重要，自然需要对模型预测给予一定程度的置信度。近年来，人们提出了几种获得不确定性估计的方法，但尚未就这些方法的评估达成共识，并且不同的不确定性研究通常使用不同的指标来评估它们。我们将三种最流行的验证指标（Spearman 等级相关系数、负对数似然 (NLL) 和误校准区域）与 Levi 等人引入的基于误差的校准进行了比较。（传感器 2022、22、5540）。重要的是，负对数似然 (NLL) 和 Spearman 等级相关系数等指标本身几乎没有信息。因此，我们引入了通过直接从不确定性分布模拟的误差获得的参考值。不同的指标针对不同的属性，我们展示了如何解释它们，但我们通常会根据 Levi 等人引入的基于误差的校准图来找到最佳的整体验证。最后，我们通过使用相同的玩具模型不同的测试集并获得截然不同的指标（0.05 与 0.65）来说明基于排名的方法（例如 Spearman 排名相关系数）对测试集设计的敏感性。

更新日期：2023-12-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>