当前位置: X-MOL 学术WIREs Data Mining Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised EHR-based phenotyping via matrix and tensor decompositions
WIREs Data Mining and Knowledge Discovery ( IF 7.8 ) Pub Date : 2023-03-05 , DOI: 10.1002/widm.1494
Florian Becker 1, 2 , Age K. Smilde 1, 3 , Evrim Acar 1
Affiliation  

Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co-occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low-rank data approximation methods such as matrix (e.g., nonnegative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping. The existing literature is categorized into temporal versus static phenotyping approaches based on matrix versus tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, that is, the assessment of clinical significance.

中文翻译:

通过矩阵和张量分解进行无监督的基于 EHR 的表型分析

计算表型分析允许在无人监督的情况下从电子健康记录 (EHR) 中发现患者亚组以及相应的并发医疗状况。通常,EHR 数据包含人口统计信息、诊断和实验室结果。发现(新的)表型有可能具有预后和治疗价值。为医生提供透明、可解释的结果是推进精准医疗的重要要求和重要组成部分。低秩数据近似方法,例如矩阵(例如,非负矩阵分解)和张量分解(例如,CANDECOMP/PARAFAC)已经证明它们可以提供如此透明和可解释的见解。最近的发展通过合并不同的约束和正则化来适应低秩数据近似方法,从而进一步促进可解释性。此外,他们还针对 EHR 数据中的常见挑战(例如高维性、数据稀疏性和不完整性)提供解决方案。特别是提取近年来,纵向电子病历的时间表型受到了广泛关注。在本文中,我们对基于低秩近似的计算表型方法进行了全面回顾。现有文献根据矩阵与张量分解分为时间与静态表型方法。此外,我们概述了验证表型的不同方法,即评估临床意义。
更新日期:2023-03-05
down
wechat
bug