research-article

Layout-aware Single-image Document Flattening

Authors:
Pu Li

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

0009-0007-1060-689X
View Profile

,
Weize Quan

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

0000-0003-0892-581X
View Profile

,
Jianwei Guo

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

0000-0002-3376-1725
View Profile

,
Dong-Ming Yan

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

MAIS, Institute of Automation, CAS and School of Artificial Intelligence, UCAS, China

0000-0003-2209-2404
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 43 Issue 1Article No.: 9pp 1–17https://doi.org/10.1145/3627818

Published:02 November 2023Publication History

ACM Transactions on Graphics

Abstract

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

Supplemental Material

Available for Download

pdf

3627818-supp.pdf (44.4 KB)

Supplementary material

REFERENCES

Islam Md Amirul, Rochan Mrigank, Bruce Neil D. B., and Wang Yang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 3751–3759.Google ScholarCross Ref
Binmakhashen Galal M. and Mahmoud Sabri A.. 2019. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 52, 6 (2019).Google ScholarDigital Library
Oliveira Dário Augusto Borges and Viana Matheus Palhares. 2017. Fast CNN-based document layout analysis. In International Conference on Computer Vision Workshop. 1173–1180.Google Scholar
Brown Michael S. and Seales W. Brent. 2001. Document restoration using 3D shape: A general deskewing algorithm for arbitrarily warped documents. In IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. 367–374.Google ScholarCross Ref
Brown Michael S., Sun Mingxuan, Yang Ruigang, Yun Lin, and Seales W. Brent. 2007. Restoring 2D content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 1904–1916.Google ScholarDigital Library
Brown Michael S. and Tsoi Yau-Chat. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. Image Process. 15, 6 (2006), 1544–1554.Google ScholarDigital Library
Burden Alexander, Cote Melissa, and Albu Alexandra Branzan. 2019. Rectification of camera-captured document images with mixed contents and varied layouts. In IEEE Conference on Computer Robot Vision. 33–40.Google ScholarCross Ref
Cao Huaigu, Ding Xiaoqing, and Liu Changsong. 2003. A cylindrical surface model to rectify the bound document image. In IEEE International Conference on Computer Vision (ICCV’03), Vol. 1. 228–233.Google Scholar
Chen Lei, Liu Rui, Zhou Dongsheng, Yang Xin, and Zhang Qiang. 2020. Fused behavior recognition model based on attention mechanism. Vis. Comput. Industr., Biomed. Art 3, 1 (2020), 1–10.Google Scholar
Courteille Frédéric, Crouzil Alain, Durou Jean-Denis, and Gurdjos Pierre. 2007. Shape from shading for the digitization of curved documents. Pattern Recog. 18 (2007), 301–316.Google Scholar
Das Sagnik, Ma Ke, Shu Zhixin, Samaras Dimitris, and Shilkrot Roy. 2019. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In IEEE International Conference on Computer Vision (ICCV’19). 131–140.Google ScholarCross Ref
Das Sagnik, Mishra Gaurav, Sudharshana Akshay, and Shilkrot Roy. 2017. The common fold: Utilizing the four-fold to dewarp printed documents from a single image. In ACM Symposium on Document Engineering. 125–128.Google Scholar
Das Sagnik, Singh Kunwar Yashraj, Wu Jon, Bas Erhan, Mahadevan Vijay, Bhotika Rahul, and Samaras Dimitris. 2021. End-to-end piece-wise unwarping of document images. In IEEE/CVF International Conference on Computer Vision. 4268–4277.Google ScholarCross Ref
Dasgupta Tanmoy, Das Nibaran, and Nasipuri Mita. 2020. Multistage curvilinear coordinate transform based document image dewarping using a novel quality estimator. CoRR abs/2003.06872 (2020).Google Scholar
Davoudi Homa, Fiorucci Marco, and Traviglia Arianna. 2021. Ancient document layout analysis: Autoencoders meet sparse coding. In International Conference on Pattern Recognition. 5936–5942.Google ScholarCross Ref
Ding Henghui, Jiang Xudong, Shuai Bing, Liu Ai Qun, and Wang Gang. 2020. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 29 (2020), 3520–3533.Google ScholarDigital Library
Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Uszkoreit Jakob, and Houlsby Neil. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.Google Scholar
Fawzi Mohamed, Rashwan Mohsen. A., Ahmed Hany, Samir Shaimaa, Abdou Sherif M., Al-Barhamtoshy Hassanin M., and Jambi Kamal M.. 2015. Rectification of camera captured document images for camera-based OCR technology. In IAPR International Conference on Document Analysis and Recognition. 1226–1230.Google ScholarDigital Library
Feng Hao, Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep unrestricted document image rectification. arXiv preprint arXiv:2304.08796.Google Scholar
Feng Hao, Wang Yuechen, Zhou Wengang, Deng Jiajun, and Li Houqiang. 2021a. DocTr: Document image transformer for geometric unwarping and illumination correction. In ACM International Conference on Multimedia. 273–281.Google ScholarDigital Library
Feng Hao, Zhou Wengang, Deng Jiajun, Tian Qi, and Li Houqiang. 2021b. DocScanner: Robust document image rectification with progressive learning. CoRR abs/2110.14968 (2021).Google Scholar
Feng Hao, Zhou Wengang, Deng Jiajun, Wang Yuechen, and Li Houqiang. 2022. Geometric representation learning for document image rectification. In European Conference on Computer Vision.Google ScholarDigital Library
Gardner Marc-André, Sunkavalli Kalyan, Yumer Ersin, Shen Xiaohui, Gambaretto Emiliano, Gagné Christian, and Lalonde Jean-François. 2017. Learning to predict indoor illumination from a single image. ACM Trans. Graph. 36, 6 (2017).Google ScholarDigital Library
He Dafang, Cohen Scott, Price Brian, Kifer Daniel, and Giles C. Lee. 2017. Multi-scale multi-task FCN for semantic page segmentation and table detection. In IAPR International Conference on Document Analysis and Recognition, Vol. 01. 254–261.Google ScholarCross Ref
He Yuan, Pan Pan, Xie Shufu, Sun Jun, and Naoi Satoshi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In IAPR International Conference on Document Analysis and Recognition. 403–407.Google ScholarDigital Library
Jiang Xiangwei, Long Rujiao, Xue Nan, Yang Zhibo, Yao Cong, and Xia Gui-Song. 2022. Revisiting document image dewarping by grid regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4543–4552.Google ScholarCross Ref
Kim Beom Su, Koo Hyung Il, and Cho Nam Ik. 2015. Document dewarping via text-line based optimization. Pattern Recog. 48, 11 (2015), 3600–3614.Google ScholarDigital Library
Kim Theodore, Thürey Nils, James Doug, and Gross Markus. 2008. Wavelet turbulence for fluid simulation. ACM Trans. Graph. 27, 3 (2008), 1–6.Google ScholarDigital Library
Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google Scholar
Koo Hyung Il and Cho Nam Ik. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision (ECCV’10). 421–434.Google ScholarCross Ref
Lavialle Olivier, Molines X., Angella Franck, and Baylou Pierre. 2001. Active contours network to straighten distorted text lines. In IEEE International Conference on Image Processing, Vol. 3. 748–751.Google ScholarCross Ref
Li Xiaoyu, Zhang Bo, Liao Jing, and Sander Pedro V.. 2019. Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38, 6 (2019).Google ScholarDigital Library
Liang Jian, DeMenthon Daniel, and Doermann David. 2008. Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 591–605.Google ScholarDigital Library
Lin Guosheng, Milan Anton, Shen Chunhua, and Reid Ian. 2017b. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 1925–1934.Google ScholarCross Ref
Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017a. Feature pyramid networks for object detection. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 2117–2125.Google ScholarCross Ref
Liu Ce, Yuen Jenny, and Torralba Antonio. 2010. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2010), 978–994.Google ScholarDigital Library
Liu Xiyan, Meng Gaofeng, Fan Bin, Xiang Shiming, and Pan Chunhong. 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recog. 108 (2020), 107576.Google ScholarCross Ref
Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’15). 3431–3440.Google ScholarCross Ref
Ma Ke, Das Sagnik, Shu Zhixin, and Samaras Dimitris. 2022. Learning from documents in the wild to improve document unwarping. In ACM SIGGRAPH Conference Proceedings. 1–9.Google ScholarDigital Library
Ma Ke, Shu Zhixin, Bai Xue, Wang Jue, and Samaras Dimitris. 2018. DocUNet: Document image unwarping via a stacked u-net. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18).Google ScholarCross Ref
Markovitz Amir, Lavi Inbal, Perel Or, Mazor Shai, and Litman Roee. 2020. Can you read me now? Content aware rectification using angle supervision. In European Conference on Computer Vision (ECCV’20). 208–223.Google ScholarDigital Library
Meng Gaofeng, Su Yuanqi, Wu Ying, Xiang Shiming, and Pan Chunhong. 2018. Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision (ECCV’18). 180–195.Google ScholarDigital Library
Meng Gaofeng, Wang Ying, Qu Shenquan, Xiang Shiming, and Pan Chunhong. 2014. Active flattening of curved document images via two structured beams. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’14). 3890–3897.Google ScholarDigital Library
Mischke Lothar and Luther Wolfram. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis Processing. 1068–1075.Google ScholarDigital Library
Pfaff Tobias, Thuerey Nils, Cohen Jonathan, Tariq Sarah, and Gross Markus. 2010. Scalable fluid simulation using anisotropic turbulence particles. ACM Trans. Graph. 29, 6 (2010).Google ScholarDigital Library
Ranftl René, Bochkovskiy Alexey, and Koltun Vladlen. 2021. Vision transformers for dense prediction. In IEEE International Conference on Computer Vision (ICCV’21). 12179–12188.Google ScholarCross Ref
Salvi Dhaval, Zheng Kang, Zhou Youjie, and Wang Song. 2015. Distance transform based active contour approach for document image rectification. In IEEE Winter Conference on on Applications of Computer Vision. 757–764.Google ScholarDigital Library
Stamatopoulos Nikolaos, Gatos Basilis, Pratikakis Ioannis, and Perantonis Stavros J.. 2011. Goal-oriented rectification of camera-based document images. IEEE Trans. Image Process. 20, 4 (2011), 910–920.Google ScholarDigital Library
Sun Mingxuan, Yang Ruigang, Yun Lin, Landon G., Seales W. Brent, and Brown Michael S.. 2005. Geometric and photometric restoration of distorted documents. In IEEE International Conference on Computer Vision (ICCV’05), Vol. 2. 1117–1123.Google Scholar
Takezawa Yusuke, Hasegawa Makoto, and Tabbone Salvatore. 2017. Robust perspective rectification of camera-captured document images. In IAPR International Conference on Document Analysis and Recognition, Vol. 06. 27–32.Google ScholarCross Ref
Tan Chew Lim, Zhang Li, Zhang Zheng, and Xia Tao. 2006. Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 195–208.Google ScholarDigital Library
Tian Yuandong and Narasimhan Srinivasa G.. 2011. Rectification and 3D reconstruction of curved document images. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’11). 377–384.Google ScholarDigital Library
Tsoi Yau-Chat and Brown Michael S.. 2007. Multi-view document rectification using boundary. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’07). 1–8.Google ScholarCross Ref
Ulges Adrian, Lampert Christoph H., and Breuel Thomas M.. 2005. Document image dewarping using robust estimation of curled text lines. In IAPR International Conference on Document Analysis and Recognition, Vol. 2. 1001– 1005.Google ScholarDigital Library
Vaswani Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, California, 6000–6010.Google Scholar
Wada Toshikazu, Ukida Hiroyuki, and Matsuyama Takashi. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book.Int. J. Comput. Vis. 24, 2 (1997), 125–135.Google ScholarDigital Library
Wang Zhou, Simoncelli Eero P., and Bovik Alan C.. 2003. Multiscale structural similarity for image quality assessment. In 37th Asilomar Conference on Signals, Systems & Computers, Vol. 2. IEEE, 1398–1402.Google ScholarCross Ref
Wu Xingjiao, Hu Ziling, Du Xiangcheng, Yang Jing, and He Liang. 2021. Document layout analysis via dynamic residual feature fusion. In International Conference on Multimedia and Expo. 1–6.Google ScholarCross Ref
Xian Ke, Shen Chunhua, Cao Zhiguo, Lu Hao, Xiao Yang, Li Ruibo, and Luo Zhenbo. 2018. Monocular relative depth perception with web stereo data supervision. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18). 311–320.Google ScholarCross Ref
Xie Guo-Wang, Yin Fei, Zhang Xu-Yao, and Liu Cheng-Lin. 2021a. Dewarping document image by displacement flow estimation with fully convolutional network. CoRR abs/2104.06815 (2021).Google Scholar
Xie Guo-Wang, Yin Fei, Zhang Xu-Yao, and Liu Cheng-Lin. 2021b. Document dewarping with control points. In International Conference on Document Analysis and Recognition. Springer, 466–480.Google ScholarDigital Library
You Shaodi, Matsushita Yasuyuki, Sinha Sudipta, Bou Yusuke, and Ikeuchi Katsushi. 2018. Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 505–511.Google ScholarDigital Library
Zandifar Ali. 2007. Unwarping scanned image of Japanese/English documents. In International Conference on Image Analysis and Processing. 129–136.Google ScholarCross Ref
Zhang Jiaxin, Luo Canjie, Jin Lianwen, Guo Fengjun, and Ding Kai. 2022. Marior: Margin removal and iterative content rectification for document dewarping in the wild. arXiv preprint arXiv:2207.11515 (2022).Google Scholar
Zhang Li, Yip A. M., Brown M. S., and Tan Chew Lim. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recog. 42, 11 (2009), 2961–2978.Google ScholarDigital Library
Zhang Li, Zhang Yu, and Tan Chew. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Anal. Mach. Intell. 30, 4 (2008), 728–734.Google Scholar
Zhong Xu, Tang Jianbin, and Yepes Antonio Jimeno. 2019. PubLayNet: Largest dataset ever for document layout analysis. In IAPR International Conference on Document Analysis and Recognition. 1015–1022.Google ScholarCross Ref

Index Terms

Layout-aware Single-image Document Flattening
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Document Layout Analysis Based on Emergent Computation
ICDAR '97: Proceedings of the 4th International Conference on Document Analysis and Recognition

A new method of document layout analysis is proposed for a document reader to be used for reading a wide variety of documents. Emergent computation, which is a key concept of artificial life, is adopted to analyze various complex document structures. ...
Read More
A Deep Learning-Based System for Document Layout Analysis
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

Document image understanding is an essential process in the digital transformation era. Those systems automatically convert a paper document to a digital document for storing and information extracting. In practice, document layout analysis is a ...
Read More
Geometric Representation Learning for Document Image Rectification
Computer Vision – ECCV 2022
Abstract
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 43, Issue 1
February 2024
211 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3613512
Editor:
Carol O'Sullivan
Trinity College Dublin, Ireland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2023
- Online AM: 13 October 2023
- Accepted: 25 September 2023
- Revised: 10 July 2023
- Received: 8 September 2022
Published in tog Volume 43, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Document image rectification
document layout analysis
deep neural networks
geometric models
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 698
  Total Downloads
- Downloads (Last 12 months)698
- Downloads (Last 6 weeks)104
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Layout-aware Single-image Document Flattening

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

REFERENCES

Cited By

Index Terms

Recommendations

Document Layout Analysis Based on Emergent Computation

A Deep Learning-Based System for Document Layout Analysis

Geometric Representation Learning for Document Image Rectification