Layout-aware Single-image Document Flattening,ACM Transactions on Graphics

当前位置： X-MOL 学术 › ACM Trans. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Layout-aware Single-image Document Flattening
ACM Transactions on Graphics ( IF 6.2 ) Pub Date : 2023-11-02 , DOI: 10.1145/3627818
Pu Li ₁ , Weize Quan ₁ , Jianwei Guo ₁ , Dong-Ming Yan ₁

Affiliation

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

中文翻译：

布局感知的单图像文档展平

文档变形的单图像校正是一项具有挑战性的任务。尽管最近一些基于深度学习的方法试图解决这个问题，但在处理具有复杂变形的文档图像时，它们无法取得令人满意的结果。在本文中，我们提出了一种新的高效文档扁平化框架。我们的主要见解是，文档中的大多数布局基元都具有矩形轮廓形状，使得反扭曲局部布局基元与反扭曲整个文档本质上是同质的。由于更一致的纹理和相对平滑的变形，前一个任务显然比后者更容易解决。在此基础上，我们提出了一种以分而治之的方式工作的布局感知深度模型。首先，我们采用基于变压器的分割模块来获取输入文档的布局信息。然后应用新的回归模块来预测全局和局部 UV 贴图。最后，我们设计了一种有效的合并算法来用局部细节校正全局预测。定量和定性实验结果都表明，我们的框架相对于最先进的方法取得了良好的性能。此外，当前公开的文档展平数据集的 3D 纸张形状有限，没有布局注释，也缺乏通用的几何校正度量。因此，我们利用全自动渲染方法构建一个新的大规模合成数据集，生成具有不同形状和精确布局分割标签的变形文档。我们还基于配对文档 UV 贴图提出了一种新的几何校正度量。代码和数据集将在 https://github.com/BunnySoCrazy/LA-DocFlatten 发布。

更新日期：2023-11-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>