当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new workflow for the effective curation of membrane permeability data from open ADME information
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-03-14 , DOI: 10.1186/s13321-024-00826-z
Tsuyoshi Esaki , Tomoki Yonezawa , Kazuyoshi Ikeda

Membrane permeability is an in vitro parameter that represents the apparent permeability (Papp) of a compound, and is a key absorption, distribution, metabolism, and excretion parameter in drug development. Although the Caco-2 cell lines are the most used cell lines to measure Papp, other cell lines, such as the Madin-Darby Canine Kidney (MDCK), LLC-Pig Kidney 1 (LLC-PK1), and Ralph Russ Canine Kidney (RRCK) cell lines, can also be used to estimate Papp. Therefore, constructing in silico models for Papp estimation using the MDCK, LLC-PK1, and RRCK cell lines requires collecting extensive amounts of in vitro Papp data. An open database offers extensive measurements of various compounds covering a vast chemical space; however, concerns were reported on the use of data published in open databases without the appropriate accuracy and quality checks. Ensuring the quality of datasets for training in silico models is critical because artificial intelligence (AI, including deep learning) was used to develop models to predict various pharmacokinetic properties, and data quality affects the performance of these models. Hence, careful curation of the collected data is imperative. Herein, we developed a new workflow that supports automatic curation of Papp data measured in the MDCK, LLC-PK1, and RRCK cell lines collected from ChEMBL using KNIME. The workflow consisted of four main phases. Data were extracted from ChEMBL and filtered to identify the target protocols. A total of 1661 high-quality entries were retained after checking 436 articles. The workflow is freely available, can be updated, and has high reusability. Our study provides a novel approach for data quality analysis and accelerates the development of helpful in silico models for effective drug discovery. Scientific Contribution: The cost of building highly accurate predictive models can be significantly reduced by automating the collection of reliable measurement data. Our tool reduces the time and effort required for data collection and will enable researchers to focus on constructing high-performance in silico models for other types of analysis. To the best of our knowledge, no such tool is available in the literature.

中文翻译:

用于从开放的 ADME 信息中有效管理膜渗透性数据的新工作流程

膜通透性是代表化合物表观渗透性(Papp)的体外参数,是药物开发中关键的吸收、分布、代谢和排泄参数。尽管 Caco-2 细胞系是测量 Papp 最常用的细胞系,但其他细胞系,例如 Madin-Darby 犬肾 (MDCK)、LLC-Pig Kidney 1 (LLC-PK1) 和 Ralph Russ 犬肾 ( RRCK)细胞系,也可用于估计 Papp。因此,使用 MDCK、LLC-PK1 和 RRCK 细胞系构建用于 Papp 估计的计算机模型需要收集大量的体外 Papp 数据。开放数据库提供涵盖广阔化学空间的各种化合物的广泛测量;然而,有人对使用开放数据库中发布的数据而没有进行适当的准确性和质量检查表示关切。确保计算机模型训练数据集的质量至关重要,因为人工智能(AI,包括深度学习)用于开发模型来预测各种药代动力学特性,而数据质量会影响这些模型的性能。因此,仔细整理收集的数据势在必行。在此,我们开发了一种新的工作流程,支持自动管理使用 KNIME 从 ChEMBL 收集的 MDCK、LLC-PK1 和 RRCK 细胞系中测量的 Papp 数据。工作流程由四个主要阶段组成。从 ChEMBL 中提取数据并进行过滤以识别目标方案。经核查436篇文章,共保留1661篇优质词条。工作流程免费、可更新、复用性高。我们的研究为数据质量分析提供了一种新方法,并加速了有助于有效药物发现的计算机模型的开发。科学贡献:通过自动收集可靠的测量数据,可以显着降低构建高精度预测模型的成本。我们的工具减少了数据收集所需的时间和精力,并使研究人员能够专注于构建用于其他类型分析的高性能计算机模型。据我们所知,文献中没有这样的工具。
更新日期:2024-03-14
down
wechat
bug