Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies,Medical Image Analysis

当前位置： X-MOL 学术 › Med. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies
Medical Image Analysis ( IF 10.9 ) Pub Date : 2024-03-29 , DOI: 10.1016/j.media.2024.103157
Tim J.M. Jaspers , Tim G.W. Boers , Carolus H.J. Kusters , Martijn R. Jong , Jelmer B. Jukema , Albert J. de Groof , Jacques J. Bergman , Peter H.N. de With , Fons van der Sommen

Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss.

中文翻译：

用于内窥镜图像分析的深度神经网络的鲁棒性评估：见解和策略

内窥镜检查中的计算机辅助检测和诊断系统 (CADe/CADx) 通常使用高质量图像进行训练，这不能代表临床实践中通常遇到的异构输入。在内窥镜检查中，图像质量很大程度上取决于内窥镜医师的技能和经验以及用于筛查的系统的规格。照明不良、运动模糊和特定的后处理设置等因素可能会显着改变这些图像的质量和总体外观。用于开发系统的数据与部署后遇到的数据之间的所谓差异，以及它对支持内窥镜 CAD 系统的深度神经网络 (DNN) 性能的影响，在很大程度上仍未得到探索。由于许多此类系统（例如息肉检测）已经在临床实践中推广，这给患者带来了严重的风险，尤其是社区医院，因为这些医院的成像设备和经验都存在相当大的变化。因此，本研究旨在评估该域间隙对各种内窥镜应用的 CADe/CADx 临床表现的影响。为此，我们利用两个公开数据集（KVASIR-SEG 和 GIANA）和两个内部数据集。我们研究了常用 DNN 架构在合成、临床校准图像退化下以及前瞻性收集的数据集（包括 342 个主观质量较低的内窥镜图像）的性能。此外，我们还评估了 DNN 架构和复杂性、数据增强和预训练技术对提高鲁棒性的影响。结果显示，与参考相比，在图像退化的临床校准边界内，性能大幅下降 11.6% (1.5)。尽管如此，采用更先进的 DNN 架构和自监督域内预训练有效地将这一下降幅度降至 7.7% (2.03)。此外，这些增强功能在手动收集的测试集（包括主观质量较低的图像）上产生了最高的性能。通过全面评估流行的 DNN 架构和跨多个数据集的训练策略的稳健性，本研究提供了有关其内窥镜应用的性能和局限性的宝贵见解。研究结果强调了在内窥镜应用开发 DNN 时纳入鲁棒性评估的重要性，并提出了减轻性能损失的策略。

更新日期：2024-03-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>