Unlocking maintenance insights in industrial text through semantic search,Computers in Industry

当前位置： X-MOL 学术 › Comput. Ind. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unlocking maintenance insights in industrial text through semantic search
Computers in Industry ( IF 10.0 ) Pub Date : 2024-03-21 , DOI: 10.1016/j.compind.2024.104083
Syed Meesam Raza Naqvi , Mohammad Ghufran , Christophe Varnier , Jean-Marc Nicod , Kamran Javed , Noureddine Zerhouni

Maintenance records in Computerized Maintenance Management Systems (CMMS) contain valuable human knowledge on maintenance activities. These records primarily consist of noisy and unstructured texts written by maintenance experts. The technical nature of the text, combined with a concise writing style and frequent use of abbreviations, makes it difficult to be processed through classical Natural Language Processing (NLP) pipelines. Due to these complexities, this text must be normalized before feeding to classical machine learning models. Developing these custom normalization pipelines requires manual labor and domain expertise and is a time-consuming process that demands constant updates. This leads to the under-utilization of this valuable source of information to generate insights to help with maintenance decision support. This study proposes a Technical Language Processing (TLP) pipeline for semantic search in industrial text using BERT (Bidirectional Encoder Representations), a transformer-based Large Language Model (LLM). The proposed pipeline can automatically process complex unstructured industrial text and does not require custom preprocessing. To adapt the BERT model for the target domain, three unsupervised domain fine-tuning techniques are compared to identify the best strategy for leveraging available tacit knowledge in industrial text. The proposed approach is validated on two industrial maintenance records from the mining and aviation domains. Semantic search results are analyzed from a quantitative and qualitative perspective. Analysis shows that TSDAE, a state-of-the-art unsupervised domain fine-tuning technique, can efficiently identify intricate patterns in the industrial text regardless of associated complexities. BERT model fine-tuned with TSDAE on industrial text achieved a precision of 0.94 and 0.97 for mining excavators and aviation maintenance records, respectively.

中文翻译：

通过语义搜索解锁工业文本中的维护见解

计算机化维护管理系统 (CMMS) 中的维护记录包含有关维护活动的宝贵人类知识。这些记录主要由维护专家编写的嘈杂且非结构化的文本组成。文本的技术性质，加上简洁的写作风格和缩写的频繁使用，使得它很难通过经典的自然语言处理（NLP）管道进行处理。由于这些复杂性，该文本在输入到经典机器学习模型之前必须进行标准化。开发这些自定义标准化管道需要体力劳动和领域专业知识，并且是一个耗时的过程，需要不断更新。这导致无法充分利用这一宝贵的信息源来生成见解，以帮助维护决策支持。本研究提出了一种使用 BERT（双向编码器表示）（一种基于变压器的大型语言模型 (LLM)）进行工业文本语义搜索的技术语言处理 (TLP) 管道。所提出的管道可以自动处理复杂的非结构化工业文本，并且不需要自定义预处理。为了使 BERT 模型适应目标领域，对三种无监督领域微调技术进行了比较，以确定利用工业文本中可用隐性知识的最佳策略。所提出的方法在采矿和航空领域的两个工业维护记录上得到了验证。从定量和定性的角度分析语义搜索结果。分析表明，TSDAE 是一种最先进的无监督域微调技术，可以有效地识别工业文本中的复杂模式，而不管相关的复杂性如何。在工业文本上使用 TSDAE 进行微调的 BERT 模型对于采矿挖掘机和航空维修记录分别达到了 0.94 和 0.97 的精度。

更新日期：2024-03-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>