Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study,npj Digital Medicine

当前位置： X-MOL 学术 › npj Digit. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study
npj Digital Medicine ( IF 15.2 ) Pub Date : 2024-05-09 , DOI: 10.1038/s41746-024-01085-w
Lisa M. Koch , Christian F. Baumgartner , Philipp Berens

Distribution shifts remain a problem for the safe application of regulated medical AI systems, and may impact their real-world performance if undetected. Postmarket shifts can occur for example if algorithms developed on data from various acquisition settings and a heterogeneous population are predominantly applied in hospitals with lower quality data acquisition or other centre-specific acquisition factors, or where some ethnicities are over-represented. Therefore, distribution shift detection could be important for monitoring AI-based medical products during postmarket surveillance. We implemented and evaluated three deep-learning based shift detection techniques (classifier-based, deep kernel, and multiple univariate kolmogorov-smirnov tests) on simulated shifts in a dataset of 130’486 retinal images. We trained a deep learning classifier for diabetic retinopathy grading. We then simulated population shifts by changing the prevalence of patients’ sex, ethnicity, and co-morbidities, and example acquisition shifts by changes in image quality. We observed classification subgroup performance disparities w.r.t. image quality, patient sex, ethnicity and co-morbidity presence. The sensitivity at detecting referable diabetic retinopathy ranged from 0.50 to 0.79 for different ethnicities. This motivates the need for detecting shifts after deployment. Classifier-based tests performed best overall, with perfect detection rates for quality and co-morbidity subgroup shifts at a sample size of 1000. It was the only method to detect shifts in patient sex, but required large sample sizes (\(> 30^{\prime} 000\)). All methods identified easier-to-detect out-of-distribution shifts with small (≤300) sample sizes. We conclude that effective tools exist for detecting clinically relevant distribution shifts. In particular classifier-based tests can be easily implemented components in the post-market surveillance strategy of medical device manufacturers.

中文翻译：

医疗人工智能算法上市后监测的分布变化检测：回顾性模拟研究

分布变化仍然是受监管医疗人工智能系统安全应用的一个问题，如果未被发现，可能会影响其现实世界的性能。例如，如果根据来自不同采集设置和异质人群的数据开发的算法主要应用于数据采集质量较低或其他中心特定采集因素的医院，或者某些种族代表性过高的医院，则可能会发生上市后变化。因此，分销转移检测对于在上市后监测期间监测基于人工智能的医疗产品可能很重要。我们在 130,486 个视网膜图像数据集中的模拟移位上实现并评估了三种基于深度学习的移位检测技术（基于分类器、深度内核和多个单变量柯尔莫哥洛夫-斯米尔诺夫测试）。我们训练了一个用于糖尿病视网膜病变分级的深度学习分类器。然后，我们通过改变患者性别、种族和合并症的患病率来模拟人口变化，并通过图像质量的变化来模拟采集变化。我们观察到图像质量、患者性别、种族和共病存在方面的分类亚组表现差异。对于不同种族，检测可参考的糖尿病视网膜病变的敏感性范围为 0.50 至 0.79。这激发了在部署后检测变化的需要。基于分类器的测试总体表现最佳，在样本量为 1000 时，质量和共病亚组变化的检出率完美。这是检测患者性别变化的唯一方法，但需要大样本量 ( \(> 30^ {\prime} 000\) )。所有方法都通过小样本量（≤300）识别出更容易检测的分布外变化。我们的结论是，存在检测临床相关分布变化的有效工具。特别是基于分类器的测试可以很容易地成为医疗器械制造商上市后监控策略中的组成部分。

更新日期：2024-05-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>