当前位置: X-MOL 学术J Nucl. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clinical Evaluation of Deep Learning for Tumor Delineation on 18F-FDG PET/CT of Head and Neck Cancer
The Journal of Nuclear Medicine ( IF 9.3 ) Pub Date : 2024-04-01 , DOI: 10.2967/jnumed.123.266574
David G. Kovacs , Claes N. Ladefoged , Kim F. Andersen , Jane M. Brittain , Charlotte B. Christensen , Danijela Dejanovic , Naja L. Hansen , Annika Loft , Jørgen H. Petersen , Michala Reichkendler , Flemming L. Andersen , Barbara M. Fischer

Artificial intelligence (AI) may decrease 18F-FDG PET/CT–based gross tumor volume (GTV) delineation variability and automate tumor-volume–derived image biomarker extraction. Hence, we aimed to identify and evaluate promising state-of-the-art deep learning methods for head and neck cancer (HNC) PET GTV delineation. Methods: We trained and evaluated deep learning methods using retrospectively included scans of HNC patients referred for radiotherapy between January 2014 and December 2019 (ISRCTN16907234). We used 3 test datasets: an internal set to compare methods, another internal set to compare AI-to-expert variability and expert interobserver variability (IOV), and an external set to compare internal and external AI-to-expert variability. Expert PET GTVs were used as the reference standard. Our benchmark IOV was measured using the PET GTV of 6 experts. The primary outcome was the Dice similarity coefficient (DSC). ANOVA was used to compare methods, a paired t test was used to compare AI-to-expert variability and expert IOV, an unpaired t test was used to compare internal and external AI-to-expert variability, and post hoc Bland–Altman analysis was used to evaluate biomarker agreement. Results: In total, 1,220 18F-FDG PET/CT scans of 1,190 patients (mean age ± SD, 63 ± 10 y; 858 men) were included, and 5 deep learning methods were trained using 5-fold cross-validation (n = 805). The nnU-Net method achieved the highest similarity (DSC, 0.80 [95% CI, 0.77–0.86]; n = 196). We found no evidence of a difference between expert IOV and AI-to-expert variability (DSC, 0.78 for AI vs. 0.82 for experts; mean difference of 0.04 [95% CI, –0.01 to 0.09]; P = 0.12; n = 64). We found no evidence of a difference between the internal and external AI-to-expert variability (DSC, 0.80 internally vs. 0.81 externally; mean difference of 0.004 [95% CI, –0.05 to 0.04]; P = 0.87; n = 125). PET GTV–derived biomarkers of AI were in good agreement with experts. Conclusion: Deep learning can be used to automate 18F-FDG PET/CT tumor-volume–derived imaging biomarkers, and the deep-learning–based volumes have the potential to assist clinical tumor volume delineation in radiation oncology.



中文翻译:

深度学习对头颈癌 18F-FDG PET/CT 肿瘤勾画的临床评价

人工智能 (AI) 可以减少基于18 F - FDG PET/CT 的大体肿瘤体积 (GTV) 勾画变异性,并自动提取肿瘤体积衍生的图像生物标志物。因此,我们的目标是识别和评估用于头颈癌 (HNC) PET GTV 描绘的最先进的深度学习方法。方法:我们使用 2014 年 1 月至 2019 年 12 月期间转诊接受放射治疗的 HNC 患者的回顾性扫描 (ISRCTN16907234) 来训练和评估深度学习方法。我们使用了 3 个测试数据集:一个用于比较方法的内部集,另一个用于比较 AI 与专家之间的变异性和专家观察者间变异性 (IOV) 的内部集,以及一个用于比较内部和外部 AI 与专家之间的变异性的外部集。使用专家 PET GTV 作为参考标准。我们的基准 IOV 是使用 6 位专家的 PET GTV 进行测量的。主要结果是 Dice 相似系数 (DSC)。使用方差分析来比较方法,使用配对t检验来比较人工智能与专家之间的变异性和专家 IOV,使用非配对t检验来比较内部和外部人工智能与专家之间的变异性,以及事后 Bland-Altman 分析用于评估生物标志物一致性。结果:总共纳入了 1,190 名患者(平均年龄 ± SD,63 ± 10 岁;858 名男性)的 1,220 幅18 F - FDG PET/CT 扫描,并使用 5 倍交叉验证训练了 5 种深度学习方法(n = 805)。 nnU-Net 方法实现了最高的相似度(DSC,0.80 [95% CI,0.77–0.86];n = 196)。我们没有发现专家 IOV 和 AI 与专家之间的差异存在差异的证据(DSC,AI 为 0.78,专家为 0.82;平均差异为 0.04 [95% CI,–0.01 至 0.09];P = 0.12;n = 64)。我们没有发现内部和外部 AI 与专家变异之间存在差异的证据(DSC,内部为 0.80,外部为 0.81;平均差异为 0.004 [95% CI,–0.05 至 0.04];P = 0.87;n = 125 )。 PET GTV 衍生的 AI 生物标志物与专家的一致。结论:深度学习可用于自动化18 F - FDG PET/CT 肿瘤体积衍生成像生物标志物,并且基于深度学习的体积有可能协助放射肿瘤学中的临床肿瘤体积勾画。

更新日期:2024-04-01
down
wechat
bug