当前位置: X-MOL 学术Int. J. Nurs. Stud. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support
International Journal of Nursing Studies ( IF 8.1 ) Pub Date : 2024-04-09 , DOI: 10.1016/j.ijnurstu.2024.104771
Chedva Levin , Tehilla Kagan , Shani Rosen , Mor Saban

To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. 32 neonatal intensive care nurses with 5–10 years of experience working in the neonatal intensive care units of three medical centers. Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.

中文翻译:


语言模型和护士提供新生儿临床决策支持能力的评估



旨在评估两种大型语言模型 ChatGPT-4 和 Claude-2.0 的临床推理能力,与新生儿护理场景中新生儿护士的临床推理能力进行比较。一项横断面研究,使用调查工具进行比较评估,其中包括六种新生儿重症监护病房的临床情景。 32名新生儿重症监护护士,在三个医疗中心的新生儿重症监护室工作,具有5-10年的工作经验。参与者对 6 个书面临床场景做出回应。同时,我们要求 ChatGPT-4 和 Claude-2.0 针对相同场景提供初步评估和治疗建议。然后,由经过认证的新生儿执业护士对 ChatGPT-4 和 Claude-2.0 的响应的准确性、完整性和响应时间进行评分。两种模型都展示了新生儿护理临床推理的能力,其中 Claude-2.0 在临床准确性和速度方面显着优于 ChatGPT-4。然而,这些病例在诊断精度、治疗特异性和反应滞后方面都存在局限性。尽管显示出希望,但当前的限制强化了在 ChatGPT-4 和 Claude-2.0 考虑融入临床实践之前进行深度改进的需要。对这些工具的额外验证对于安全地利用这种人工智能技术来增强临床决策非常重要。该研究提供了对新生儿临床护理中新人工智能模型推理准确性的理解。 ChatGPT-4 和 Claude-2.0 目前的准确性差距需要在临床使用之前解决。
更新日期:2024-04-09
down
wechat
bug