当前位置: X-MOL 学术Inf. Syst. Front. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tools for Healthcare Data Lake Infrastructure Benchmarking
Information Systems Frontiers ( IF 5.9 ) Pub Date : 2024-01-17 , DOI: 10.1007/s10796-023-10468-5
Tommaso Dolci , Lorenzo Amata , Carlo Manco , Fabio Azzalini , Marco Gribaudo , Letizia Tanca

Vast amounts of medical data are generated every day, and constitute a crucial asset to improve therapy outcomes, medical treatments and healthcare costs. Data lakes are a valuable solution for the management and analysis of such a variety and abundance of data, yet to date there is no data lake architecture specifically designed for the healthcare domain. Moreover, benchmarking the underlying infrastructure of data lakes is fundamental for optimizing resource allocation and performance, increasing the potential of this kind of data platforms. This work describes a data lake architecture to ingest, store, process, and analyze heterogeneous medical data. Also, we present a benchmark for infrastructures supporting healthcare data lakes, focusing on a variety of analysis tasks, from relational analysis to machine learning. The benchmark is tested on a virtualized implementation of our data lake architecture, and on two external cloud-based infrastructures. Our results highlight distinctions between infrastructures and tasks of different nature, according to the machine learning techniques, data sizes and formats involved.



中文翻译:

医疗保健数据湖基础设施基准测试工具

每天都会产生大量的医疗数据,它们构成了改善治疗结果、医疗和医疗保健成本的重要资产。数据湖是管理和分析如此丰富的数据的宝贵解决方案,但迄今为止还没有专门为医疗保健领域设计的数据湖架构。此外,对数据湖底层基础设施进行基准测试对于优化资源分配和性能、增加此类数据平台的潜力至关重要。这项工作描述了一种用于摄取、存储、处理和分析异构医疗数据的数据湖架构。此外,我们还提出了支持医疗数据湖的基础设施基准,重点关注从关系分析到机器学习的各种分析任务。该基准测试在我们的数据湖架构的虚拟化实现以及两个外部基于云的基础设施上进行了测试。根据所涉及的机器学习技术、数据大小和格式,我们的结果突出了基础设施和不同性质的任务之间的区别。

更新日期:2024-01-17
down
wechat
bug