Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images,JAMA Ophthalmology

当前位置： X-MOL 学术 › JAMA Ophthalmol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images
JAMA Ophthalmology ( IF 8.1 ) Pub Date : 2024-02-29 , DOI: 10.1001/jamaophthalmol.2024.0017
Andrew Mihalache ₁ , Ryan S. Huang ₁ , Marko M. Popovic ₂ , Nikhil S. Patil ₃ , Bhadra U. Pandya ₁ , Reut Shor ₂ , Austin Pereira ₂ , Jason M. Kwok _{1,

2} , Peng Yan _{1,

2} , David T. Wong _{2,

4} , Peter J. Kertes _{2,

5} , Rajeev H. Muni _{2,

4}

Affiliation

ImportanceOphthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. The new ability of ChatGPT-4 (OpenAI) to interpret ophthalmic images has not yet been explored.ObjectiveTo evaluate the performance of the novel release of an artificial intelligence chatbot that is capable of processing imaging data.Design, Setting, and ParticipantsThis cross-sectional study used a publicly available dataset of ophthalmic cases from OCTCases, a medical education platform based out of the Department of Ophthalmology and Vision Sciences at the University of Toronto, with accompanying clinical multimodal imaging and multiple-choice questions. Across 137 available cases, 136 contained multiple-choice questions (99%).ExposuresThe chatbot answered questions requiring multimodal input from October 16 to October 23, 2023.Main Outcomes and MeasuresThe primary outcome was the accuracy of the chatbot in answering multiple-choice questions pertaining to image recognition in ophthalmic cases, measured as the proportion of correct responses. χ2 Tests were conducted to compare the proportion of correct responses across different ophthalmic subspecialties.ResultsA total of 429 multiple-choice questions from 136 ophthalmic cases and 448 images were included in the analysis. The chatbot answered 299 of multiple-choice questions correctly across all cases (70%). The chatbot’s performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% CI, 7.5%-29.4%; χ21 = 11.4; P &lt; .001). The chatbot achieved a better performance on nonimage–based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ21 = 12.2; P &lt; .001).The chatbot performed best on questions in the retina category (77% correct) and poorest in the neuro-ophthalmology category (58% correct). The chatbot demonstrated intermediate performance on questions from the ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct) categories.Conclusions and RelevanceIn this study, the recent version of the chatbot accurately responded to approximately two-thirds of multiple-choice questions pertaining to ophthalmic cases based on imaging interpretation. The multimodal chatbot performed better on questions that did not rely on the interpretation of imaging modalities. As the use of multimodal chatbots becomes increasingly widespread, it is imperative to stress their appropriate integration within medical contexts.

中文翻译：

人工智能聊天机器人解读临床眼科图像的准确性

重要性眼科依赖于多模态成像的有效解释来确保诊断准确性。ChatGPT-4 (OpenAI) 解释眼科图像的新能力尚未被探索。目的评估新发布的能够处理成像数据的人工智能聊天机器人的性能。设计、设置和参与者这个横截面研究使用了来自 OTCases 的公开眼科病例数据集，OCTCases 是一个基于多伦多大学眼科和视觉科学系的医学教育平台，并附带临床多模态成像和多项选择题。在 137 个可用案例中，136 个包含多项选择题（99%）。暴露聊天机器人在 2023 年 10 月 16 日至 10 月 23 日期间回答了需要多模式输入的问题。主要结果和措施主要结果是聊天机器人回答多项选择问题的准确性与眼科病例中的图像识别有关，以正确反应的比例来衡量。χ2进行测试以比较不同眼科亚专业的正确回答比例。结果分析中包含来自 136 个眼科病例的 429 个多项选择题和 448 张图像。在所有情况下，聊天机器人正确回答了 299 道多项选择题 (70%)。聊天机器人在视网膜问题上的表现优于神经眼科问题（77% vs 58%；差异 = 18%；95% CI，7.5%-29.4%；χ21= 11.4；磷< .001）。与基于图像的问题相比，聊天机器人在非图像问题上取得了更好的表现（82% vs 65%；差异 = 17%；95% CI，7.8%-25.1%；χ21= 12.2；磷< .001).聊天机器人在视网膜类别的问题上表现最好（正确率为 77%），在神经眼科类别的问题上表现最差（正确率为 58%）。该聊天机器人在眼肿瘤学（72% 正确）、儿科眼科（68% 正确）、葡萄膜炎（67% 正确）和青光眼（61% 正确）类别的问题上表现出中等水平。结论和相关性在本研究中，最新版本聊天机器人根据影像判读准确回答了大约三分之二与眼科病例相关的多项选择题。多模态聊天机器人在不依赖成像模式解释的问题上表现更好。随着多模式聊天机器人的使用变得越来越广泛，必须强调它们在医疗环境中的适当集成。

更新日期：2024-02-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>