Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework,npj Digital Medicine

当前位置： X-MOL 学术 › npj Digit. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework
npj Digital Medicine ( IF 15.2 ) Pub Date : 2024-04-23 , DOI: 10.1038/s41746-024-01091-y
Simone Kresevic , Mauro Giuffrè , Milos Ajcevic , Agostino Accardo , Lory S. Crocè , Dennis L. Shung

Large language models (LLMs) can potentially transform healthcare, particularly in providing the right information to the right provider at the right time in the hospital workflow. This study investigates the integration of LLMs into healthcare, specifically focusing on improving clinical decision support systems (CDSSs) through accurate interpretation of medical guidelines for chronic Hepatitis C Virus infection management. Utilizing OpenAI’s GPT-4 Turbo model, we developed a customized LLM framework that incorporates retrieval augmented generation (RAG) and prompt engineering. Our framework involved guideline conversion into the best-structured format that can be efficiently processed by LLMs to provide the most accurate output. An ablation study was conducted to evaluate the impact of different formatting and learning strategies on the LLM’s answer generation accuracy. The baseline GPT-4 Turbo model’s performance was compared against five experimental setups with increasing levels of complexity: inclusion of in-context guidelines, guideline reformatting, and implementation of few-shot learning. Our primary outcome was the qualitative assessment of accuracy based on expert review, while secondary outcomes included the quantitative measurement of similarity of LLM-generated responses to expert-provided answers using text-similarity scores. The results showed a significant improvement in accuracy from 43 to 99% (p < 0.001), when guidelines were provided as context in a coherent corpus of text and non-text sources were converted into text. In addition, few-shot learning did not seem to improve overall accuracy. The study highlights that structured guideline reformatting and advanced prompt engineering (data quality vs. data quantity) can enhance the efficacy of LLM integrations to CDSSs for guideline delivery.

中文翻译：

大语言模型优化肝病临床指南解释：基于检索增强生成的框架

大语言模型 (LLM) 可以潜在地改变医疗保健，特别是在医院工作流程中在正确的时间向正确的提供者提供正确的信息。本研究调查了法学硕士与医疗保健的整合，特别侧重于通过准确解释慢性丙型肝炎病毒感染管理的医疗指南来改善临床决策支持系统（CDSS）。利用 OpenAI 的 GPT-4 Turbo 模型，我们开发了一个定制的 LLM 框架，其中结合了检索增强生成 (RAG) 和提示工程。我们的框架涉及将指南转换为最佳结构的格式，该格式可以由法学硕士有效处理，以提供最准确的输出。进行了一项消融研究，以评估不同格式和学习策略对法学硕士答案生成准确性的影响。将基线 GPT-4 Turbo 模型的性能与复杂程度不断增加的五种实验设置进行了比较：包含上下文指南、指南重新格式化和实施少样本学习。我们的主要成果是根据专家评审对准确性进行定性评估，而次要成果包括使用文本相似性分数对法学硕士生成的回复与专家提供的答案的相似性进行定量测量。结果显示，当在连贯的文本语料库中提供指南并将非文本源转换为文本时，准确率从 43% 显着提高到 99% ( p < 0.001)。此外，小样本学习似乎并没有提高整体准确性。该研究强调，结构化指南重新格式化和先进的提示工程（数据质量与数据数量）可以提高 LLM 与 CDSS 集成的指南交付的效率。

更新日期：2024-04-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>