当前位置: X-MOL 学术Med. Image Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval
Medical Image Analysis ( IF 10.9 ) Pub Date : 2024-04-09 , DOI: 10.1016/j.media.2024.103163
Dingyi Hu , Zhiguo Jiang , Jun Shi , Fengying Xie , Kun Wu , Kunming Tang , Ming Cao , Jianguo Huai , Yushan Zheng

Large-scale digital whole slide image (WSI) datasets analysis have gained significant attention in computer-aided cancer diagnosis. Content-based histopathological image retrieval (CBHIR) is a technique that searches a large database for data samples matching input objects in both details and semantics, offering relevant diagnostic information to pathologists. However, the current methods are limited by the difficulty of gigapixels, the variable size of WSIs, and the dependence on manual annotations. In this work, we propose a novel histopathology language-image representation learning framework for fine-grained digital pathology cross-modal retrieval, which utilizes paired diagnosis reports to learn fine-grained semantics from the WSI. An anchor-based WSI encoder is built to extract hierarchical region features and a prompt-based text encoder is introduced to learn fine-grained semantics from the diagnosis reports. The proposed framework is trained with a multivariate cross-modal loss function to learn semantic information from the diagnosis report at both the instance level and region level. After training, it can perform four types of retrieval tasks based on the multi-modal database to support diagnostic requirements. We conducted experiments on an in-house dataset and a public dataset to evaluate the proposed method. Extensive experiments have demonstrated the effectiveness of the proposed method and its advantages to the present histopathology retrieval methods. The code is available at .

中文翻译:


用于细粒度数字病理学跨模态检索的组织病理学语言图像表示学习



大规模数字全幻灯片图像(WSI)数据集分析在计算机辅助癌症诊断中引起了极大的关注。基于内容的组织病理学图像检索(CBHIR)是一种在大型数据库中搜索在细节和语义上与输入对象匹配的数据样本的技术,为病理学家提供相关的诊断信息。然而,当前的方法受到十亿像素的难度、WSI 的可变大小以及对手动注释的依赖的限制。在这项工作中,我们提出了一种用于细粒度数字病理学跨模态检索的新型组织病理学语言图像表示学习框架,该框架利用配对诊断报告从 WSI 学习细粒度语义。构建基于锚的 WSI 编码器来提取分层区域特征,并引入基于提示的文本编码器来从诊断报告中学习细粒度语义。所提出的框架使用多元跨模态损失函数进行训练,以从实例级别和区域级别的诊断报告中学习语义信息。经过训练,它可以基于多模态数据库执行四种类型的检索任务,以支持诊断需求。我们在内部数据集和公共数据集上进行了实验来评估所提出的方法。大量的实验证明了该方法的有效性及其相对于现有组织病理学检索方法的优势。该代码可在 处获取。
更新日期:2024-04-09
down
wechat
bug