Abstract
Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.
Supplemental Material
Available for Download
Supplementary material
- 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 3751–3759.Google ScholarCross Ref .
- 2019. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 52, 6 (2019).Google ScholarDigital Library .
- 2017. Fast CNN-based document layout analysis. In International Conference on Computer Vision Workshop. 1173–1180.Google Scholar .
- 2001. Document restoration using 3D shape: A general deskewing algorithm for arbitrarily warped documents. In IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. 367–374.Google ScholarCross Ref .
- 2007. Restoring 2D content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 1904–1916.Google ScholarDigital Library .
- 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. Image Process. 15, 6 (2006), 1544–1554.Google ScholarDigital Library .
- 2019. Rectification of camera-captured document images with mixed contents and varied layouts. In IEEE Conference on Computer Robot Vision. 33–40.Google ScholarCross Ref .
- 2003. A cylindrical surface model to rectify the bound document image. In IEEE International Conference on Computer Vision (ICCV’03), Vol. 1. 228–233.Google Scholar .
- 2020. Fused behavior recognition model based on attention mechanism. Vis. Comput. Industr., Biomed. Art 3, 1 (2020), 1–10.Google Scholar .
- 2007. Shape from shading for the digitization of curved documents. Pattern Recog. 18 (2007), 301–316.Google Scholar .
- 2019. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In IEEE International Conference on Computer Vision (ICCV’19). 131–140.Google ScholarCross Ref .
- 2017. The common fold: Utilizing the four-fold to dewarp printed documents from a single image. In ACM Symposium on Document Engineering. 125–128.Google Scholar .
- 2021. End-to-end piece-wise unwarping of document images. In IEEE/CVF International Conference on Computer Vision. 4268–4277.Google ScholarCross Ref .
- 2020. Multistage curvilinear coordinate transform based document image dewarping using a novel quality estimator. CoRR abs/2003.06872 (2020).Google Scholar .
- 2021. Ancient document layout analysis: Autoencoders meet sparse coding. In International Conference on Pattern Recognition. 5936–5942.Google ScholarCross Ref .
- 2020. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 29 (2020), 3520–3533.Google ScholarDigital Library .
- 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.Google Scholar .
- 2015. Rectification of camera captured document images for camera-based OCR technology. In IAPR International Conference on Document Analysis and Recognition. 1226–1230.Google ScholarDigital Library .
- Google Scholar , Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep unrestricted document image rectification. arXiv preprint arXiv:2304.08796.
- 2021a. DocTr: Document image transformer for geometric unwarping and illumination correction. In ACM International Conference on Multimedia. 273–281.Google ScholarDigital Library .
- 2021b. DocScanner: Robust document image rectification with progressive learning. CoRR abs/2110.14968 (2021).Google Scholar .
- 2022. Geometric representation learning for document image rectification. In European Conference on Computer Vision.Google ScholarDigital Library .
- 2017. Learning to predict indoor illumination from a single image. ACM Trans. Graph. 36, 6 (2017).Google ScholarDigital Library .
- 2017. Multi-scale multi-task FCN for semantic page segmentation and table detection. In IAPR International Conference on Document Analysis and Recognition, Vol. 01. 254–261.Google ScholarCross Ref .
- 2013. A book dewarping system by boundary-based 3D surface reconstruction. In IAPR International Conference on Document Analysis and Recognition. 403–407.Google ScholarDigital Library .
- 2022. Revisiting document image dewarping by grid regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4543–4552.Google ScholarCross Ref .
- 2015. Document dewarping via text-line based optimization. Pattern Recog. 48, 11 (2015), 3600–3614.Google ScholarDigital Library .
- 2008. Wavelet turbulence for fluid simulation. ACM Trans. Graph. 27, 3 (2008), 1–6.Google ScholarDigital Library .
- 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google Scholar .
- 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision (ECCV’10). 421–434.Google ScholarCross Ref .
- 2001. Active contours network to straighten distorted text lines. In IEEE International Conference on Image Processing, Vol. 3. 748–751.Google ScholarCross Ref .
- 2019. Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38, 6 (2019).Google ScholarDigital Library .
- 2008. Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 591–605.Google ScholarDigital Library .
- 2017b. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 1925–1934.Google ScholarCross Ref .
- 2017a. Feature pyramid networks for object detection. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 2117–2125.Google ScholarCross Ref .
- 2010. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2010), 978–994.Google ScholarDigital Library .
- 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recog. 108 (2020), 107576.Google ScholarCross Ref .
- 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’15). 3431–3440.Google ScholarCross Ref .
- 2022. Learning from documents in the wild to improve document unwarping. In ACM SIGGRAPH Conference Proceedings. 1–9.Google ScholarDigital Library .
- 2018. DocUNet: Document image unwarping via a stacked u-net. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18).Google ScholarCross Ref .
- 2020. Can you read me now? Content aware rectification using angle supervision. In European Conference on Computer Vision (ECCV’20). 208–223.Google ScholarDigital Library .
- 2018. Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision (ECCV’18). 180–195.Google ScholarDigital Library .
- 2014. Active flattening of curved document images via two structured beams. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’14). 3890–3897.Google ScholarDigital Library .
- 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis Processing. 1068–1075.Google ScholarDigital Library .
- 2010. Scalable fluid simulation using anisotropic turbulence particles. ACM Trans. Graph. 29, 6 (2010).Google ScholarDigital Library .
- 2021. Vision transformers for dense prediction. In IEEE International Conference on Computer Vision (ICCV’21). 12179–12188.Google ScholarCross Ref .
- 2015. Distance transform based active contour approach for document image rectification. In IEEE Winter Conference on on Applications of Computer Vision. 757–764.Google ScholarDigital Library .
- 2011. Goal-oriented rectification of camera-based document images. IEEE Trans. Image Process. 20, 4 (2011), 910–920.Google ScholarDigital Library .
- 2005. Geometric and photometric restoration of distorted documents. In IEEE International Conference on Computer Vision (ICCV’05), Vol. 2. 1117–1123.Google Scholar .
- 2017. Robust perspective rectification of camera-captured document images. In IAPR International Conference on Document Analysis and Recognition, Vol. 06. 27–32.Google ScholarCross Ref .
- 2006. Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 195–208.Google ScholarDigital Library .
- 2011. Rectification and 3D reconstruction of curved document images. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’11). 377–384.Google ScholarDigital Library .
- 2007. Multi-view document rectification using boundary. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’07). 1–8.Google ScholarCross Ref .
- 2005. Document image dewarping using robust estimation of curled text lines. In IAPR International Conference on Document Analysis and Recognition, Vol. 2. 1001– 1005.Google ScholarDigital Library .
- Google Scholar , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, California, 6000–6010.
- 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book.Int. J. Comput. Vis. 24, 2 (1997), 125–135.Google ScholarDigital Library .
- 2003. Multiscale structural similarity for image quality assessment. In 37th Asilomar Conference on Signals, Systems & Computers, Vol. 2. IEEE, 1398–1402.Google ScholarCross Ref .
- 2021. Document layout analysis via dynamic residual feature fusion. In International Conference on Multimedia and Expo. 1–6.Google ScholarCross Ref .
- 2018. Monocular relative depth perception with web stereo data supervision. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18). 311–320.Google ScholarCross Ref .
- 2021a. Dewarping document image by displacement flow estimation with fully convolutional network. CoRR abs/2104.06815 (2021).Google Scholar .
- 2021b. Document dewarping with control points. In International Conference on Document Analysis and Recognition. Springer, 466–480.Google ScholarDigital Library .
- 2018. Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 505–511.Google ScholarDigital Library .
- 2007. Unwarping scanned image of Japanese/English documents. In International Conference on Image Analysis and Processing. 129–136.Google ScholarCross Ref .
- 2022. Marior: Margin removal and iterative content rectification for document dewarping in the wild. arXiv preprint arXiv:2207.11515 (2022).Google Scholar .
- 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recog. 42, 11 (2009), 2961–2978.Google ScholarDigital Library .
- 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Anal. Mach. Intell. 30, 4 (2008), 728–734.Google Scholar .
- 2019. PubLayNet: Largest dataset ever for document layout analysis. In IAPR International Conference on Document Analysis and Recognition. 1015–1022.Google ScholarCross Ref .
Index Terms
- Layout-aware Single-image Document Flattening
Recommendations
Document Layout Analysis Based on Emergent Computation
ICDAR '97: Proceedings of the 4th International Conference on Document Analysis and RecognitionA new method of document layout analysis is proposed for a document reader to be used for reading a wide variety of documents. Emergent computation, which is a key concept of artificial life, is adopted to analyze various complex document structures. ...
A Deep Learning-Based System for Document Layout Analysis
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft ComputingDocument image understanding is an essential process in the digital transformation era. Those systems automatically convert a paper document to a digital document for storing and information extracting. In practice, document layout analysis is a ...
Geometric Representation Learning for Document Image Rectification
Computer Vision – ECCV 2022AbstractIn document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification ...
Comments