当前位置: X-MOL 学术Comput. Aided Civ. Infrastruct. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Image segmentation using Vision Transformer for tunnel defect assessment
Computer-Aided Civil and Infrastructure Engineering ( IF 9.6 ) Pub Date : 2024-02-24 , DOI: 10.1111/mice.13181
Shaojie Qin 1 , Taiyue Qi 1 , Tang Deng 2 , Xiaodong Huang 3
Affiliation  

Existing tunnel detection methods include crack and water‐leakage segmentation networks. However, if the automated detection algorithm cannot process all defect cases, manual detection is required to eliminate potential risks. The existing intelligent detection methods lack a universal method that can accurately segment all types of defects, particularly when multiple defects are superimposed. To address this issue, a defect segmentation model is proposed based on Vision Transformer (ViT), which is completely different from the network structure of a convolutional neural network. The model proposes an adapter and a decoding head to improve the training effect of the transformer encoder, allowing it to be fitted to small‐scale datasets. In post‐processing, a method is proposed to quantify the threat level for the defects, with the aim of outputting qualitative results that simulate human observation. The model showed impressive results on a real‐world dataset containing 11,781 defect images collected from a real subway tunnel. The visualizing results proved that this method is effective and has uniform criteria for single, multiple, and comprehensive defects. Moreover, the tests proved that the proposed model has a significant advantage in the case of multiple‐defect superposition, and it achieved 93.77%, 88.36%, and 92.93% for mean accuracy (Acc), mean intersection over union, and mean F1‐score, respectively. With similar training parameters, the Acc of the proposed method is improved by more than 10% over the DeepLabv3+, Mask R‐convolutional neural network, and UPerNet‐R50 models and by more than 5% over the Swin Transformer and ViT‐Adapter. This study implemented a general method that can process all defect cases and output the threat evaluation results, thereby making more intelligent tunnel detection.

中文翻译:

使用 Vision Transformer 进行图像分割进行隧道缺陷评估

现有的隧道检测方法包括裂缝和漏水分割网络。然而,如果自动检测算法无法处理所有缺陷案例,则需要手动检测以消除潜在风险。现有的智能检测方法缺乏一种能够准确分割所有类型缺陷的通用方法,特别是当多个缺陷叠加时。为了解决这个问题,提出了一种基于视觉变换器(ViT)的缺陷分割模型,它与卷积神经网络的网络结构完全不同。该模型提出了一个适配器和一个解码头来提高 Transformer 编码器的训练效果,使其能够适应小规模数据集。在后处理中,提出了一种量化缺陷威胁级别的方法,目的是输出模拟人类观察的定性结果。该模型在包含从真实地铁隧道收集的 11,781 个缺陷图像的真实数据集上显示了令人印象深刻的结果。可视化结果证明该方法是有效的,并且对于单一缺陷、多重缺陷和综合缺陷具有统一的判据。此外,测试证明,该模型在多缺陷叠加的情况下具有显着优势,平均准确率(Acc)、平均交集和平均 F1 分别达到 93.77%、88.36% 和 92.93%。分别得分。在训练参数相似的情况下,该方法的 Acc 比 DeepLabv3+、Mask R 卷积神经网络和 UPerNet-R50 模型提高了 10% 以上,比 Swin Transformer 和 ViT-Adapter 提高了 5% 以上。本研究实现了一种可以处理所有缺陷情况并输出威胁评估结果的通用方法,从而使隧道检测更加智能。
更新日期:2024-02-24
down
wechat
bug