Transferability of Machine Learning Models for Geogenic Contaminated Groundwaters,Environmental Science & Technology

当前位置： X-MOL 学术 › Environ. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Transferability of Machine Learning Models for Geogenic Contaminated Groundwaters
Environmental Science & Technology ( IF 11.4 ) Pub Date : 2024-05-08 , DOI: 10.1021/acs.est.4c01327
Hailong Cao ₁ , Xianjun Xie _{2,

3} , Ziyi Xiao _{2,

3} , Wenjing Liu ₄

Affiliation

Machine learning models show promise in identifying geogenic contaminated groundwaters. Modeling in regions with no or limited samples is challenging due to the need for large training sets. One potential solution is transferring existing models to such regions. This study explores the transferability of high fluoride groundwater models between basins in the Shanxi Rift System, considering six factors, including modeling methods, predictor types, data size, sample/predictor ratio (SPR), predictor range, and data informing. Results show that transferability is achieved only when model predictors are based on hydrochemical parameters rather than surface parameters. Data informing, i.e., adding samples from challenging regions to the training set, further enhances the transferability. Stepwise regression shows that hydrochemical predictors and data informing significantly improve transferability, while data size, SPR, and predictor range have no significant effects. Additionally, despite their stronger nonlinear capabilities, random forests and artificial neural networks do not necessarily surpass logistic regression in transferability. Lastly, we utilize the t-SNE algorithm to generate low-dimensional representations of data from different basins and compare these representations to elucidate the critical role of predictor types in transferability.

中文翻译：

地质污染地下水机器学习模型的可迁移性

机器学习模型在识别地质污染地下水方面显示出前景。由于需要大量训练集，在没有样本或样本有限的区域进行建模具有挑战性。一种潜在的解决方案是将现有模型转移到这些地区。本研究探讨了山西裂谷系盆地间高氟地下水模型的可转移性，考虑了建模方法、预测变量类型、数据大小、样本/预测变量比（SPR）、预测变量范围和数据信息等六个因素。结果表明，只有当模型预测器基于水化学参数而不是表面参数时，才能实现可转移性。数据通知，即将来自挑战性区域的样本添加到训练集中，进一步增强了可迁移性。逐步回归表明，水化学预测变量和数据通知显着提高了可转移性，而数据大小、SPR 和预测变量范围没有显着影响。此外，尽管随机森林和人工神经网络具有更强的非线性能力，但其可迁移性并不一定超过逻辑回归。最后，我们利用 t-SNE 算法生成来自不同流域的数据的低维表示，并比较这些表示以阐明预测变量类型在可转移性中的关键作用。

更新日期：2024-05-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>