Abstract
Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo, a diffusion model-based framework for supervised homography learning. This framework generates image pairs with accurate labels, realistic image content, and realistic interval motion, ensuring that they satisfy adequate pairs. We utilize unlabeled image pairs with pseudo labels such as homography and dominant plane masks, computed from existing methods, to train a diffusion model that generates a supervised training dataset. To further enhance performance, we introduce a new probabilistic mask loss, which identifies outlier regions through supervised training, and an iterative mechanism to optimize the generative and homography models successively. Our experimental results demonstrate that DMHomo effectively overcomes the scarcity of qualified datasets in supervised homography learning and improves generalization to real-world scenes. The code and dataset are available at GitHub ( https://github.com/lhaippp/DMHomo).
Supplemental Material
- HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5173–5182.Google Scholar . 2017.
- 2022. Analytic-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503 (2022).Google Scholar .
- 2019. MAGSAC: Marginalizing sample consensus. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10197–10205.Google ScholarCross Ref .
- 2020. MAGSAC++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1304–1312.Google ScholarCross Ref .
- 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision. 611–625.Google ScholarDigital Library .
- 2022. Iterative deep homography estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1879–1888.Google ScholarCross Ref .
- 2017. CLKN: Cascaded Lucas-Kanade networks for image alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2213–2221.Google ScholarCross Ref .
- 2021. K-nearest neighbour classifiers—A tutorial. ACM Computing Surveys 54, 6 (2021), 1–25.Google ScholarDigital Library .
- 2016. Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016).Google Scholar .
- 2018. SuperPoint: Self-supervised interest point detection and description. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 224–236.Google ScholarCross Ref .
- 2021. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780–8794.Google Scholar .
- 2020. Robust homography estimation via dual principal component pursuit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6080–6089.Google ScholarCross Ref .
- 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2758–2766.Google ScholarDigital Library .
- 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395.Google ScholarDigital Library .
- 2018. Lightweight probabilistic deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3369–3378.Google ScholarCross Ref .
- 2012. Are we ready for autonomous driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3354–3361.Google ScholarCross Ref .
- 2022. Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3749–3761.Google ScholarCross Ref .
- 2022. RealFlow: EM-based realistic optical flow dataset generation from videos. In Proceedings of the European Conference on Computer Vision. 288–305.Google ScholarDigital Library .
- 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.Google ScholarDigital Library .
- 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.Google Scholar .
- 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).Google Scholar .
- 2022. Unsupervised homography estimation with coplanarity-aware GAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17663–17672.Google ScholarCross Ref .
- 2023. Simple diffusion: End-to-end diffusion for high resolution images. arXiv preprint arXiv:2301.11093 (2023).Google Scholar .
- 2018. Uncertainty estimates and multi-hypotheses networks for optical flow. In Proceedings of the European Conference on Computer Vision. 652–667.Google ScholarDigital Library .
- 2022. Semi-supervised deep large-baseline homography estimation with progressive equivalence constraint. arXiv preprint arXiv:2212.02763 (2022).Google Scholar .
- 2022. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364 (2022).Google Scholar .
- 2020. CorNet: Unsupervised deep homography estimation for agricultural aerial imagery. In Proceedings of the European Conference on Computer Vision. 400–417.Google ScholarDigital Library .
- 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar .
- 2015. Variational dropout and the local reparameterization trick. Advances in Neural Information Processing Systems 28 (2015), 1–9.Google Scholar .
- 2020. Deep homography estimation for dynamic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7652–7661.Google ScholarCross Ref .
- GyroFlow: Gyroscope-guided unsupervised optical flow learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12869–12878.Google Scholar . 2021.
- 2023. GyroFlow+: Gyroscope-guided unsupervised deep homography and optical flow learning. arXiv preprint arXiv:2301.10018 (2023).Google Scholar .
- 2018. MegaDepth: Learning single-view depth prediction from Internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2041–2050.Google ScholarCross Ref .
- 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google ScholarCross Ref .
- 2021a. DeepOIS: Gyroscope-guided deep optical image stabilizer compensation. IEEE Transactions on Circuits and Systems for Video Technology. Published Online, August 9, 2021. DOI: 10.1109/TCSVT.2021.3103281Google Scholar .
- 2022. Unsupervised global and local homography estimation with motion basis learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published Online, November 21, 2022.Google Scholar .
- 2023b. Content-aware unsupervised deep homography estimation and its extensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2023), 2849–2863.Google Scholar .
- 2013. Bundled camera paths for video stabilization. ACM Transactions on Graphics 32, 4 (2013), 1–10.Google ScholarDigital Library .
- 2023a. More control for free! Image synthesis with semantic diffusion guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 289–299.Google ScholarCross Ref .
- 2021b. ADNet: Attention-guided deformable convolutional network for high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 463–470.Google ScholarCross Ref .
- 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91–110.Google ScholarDigital Library .
- 2022. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927 (2022).Google Scholar .
- 2023. Image restoration with mean-reverting stochastic differential equations. arXiv preprint arXiv:2301.11699 (2023).Google Scholar .
- 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3061–3070.Google ScholarCross Ref .
- 2021. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 1 (2021), 99–106.Google ScholarDigital Library .
- 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147–1163.Google ScholarDigital Library .
- 2018. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters 3, 3 (2018), 2346–2353.Google ScholarCross Ref .
- 2021. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865–5874.Google ScholarCross Ref .
- 2021. D-NeRF: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10318– 10327.Google ScholarCross Ref .
- 2023. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12630–12641.Google ScholarCross Ref .
- 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.Google ScholarCross Ref .
- 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. 2564–2571.Google ScholarDigital Library .
- 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published Online, September 12, 2022.Google ScholarDigital Library .
- 2022. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022).Google Scholar .
- 2020. SuperGlue: Learning feature matching with graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4938–4947.Google ScholarCross Ref .
- 2012. Homography based visual odometry with known vertical direction and weak Manhattan world assumption. In Proceedings of the Vicomor Workshop at IROS, Vol. 2012.Google Scholar .
- LocalTrans: A multiscale local transformer network for cross-resolution homography estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14890–14899.Google Scholar . 2021.
- 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning. 2256–2265.Google Scholar .
- 2020a. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).Google Scholar .
- 2019. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems 32 (2019), 1–13.Google Scholar .
- 2021. Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005 (2021).Google Scholar .
- 2020b. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).Google Scholar .
- 2021. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8922–8931.Google ScholarCross Ref .
- 2022. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022).Google Scholar .
- 2019. SOSNet: Second order similarity regularization for local descriptor learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11016–11025.Google ScholarCross Ref .
- 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.Google ScholarCross Ref .
- 2021. Learning accurate dense correspondences and when to trust them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5714– 5724.Google ScholarCross Ref .
- 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.Google Scholar .
- 2022. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022).Google Scholar .
- 2018. Deep high dynamic range imaging with large foreground motions. In Proceedings of the European Conference on Computer Vision. 117–132.Google ScholarDigital Library .
- Motion basis learning for unsupervised deep homography estimation with subspace projection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13117–13125.Google Scholar . 2021.
- 2016. LIFT: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision. 467–483.Google ScholarCross Ref .
- 2016. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In Computer Vision—ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915. Springer, 3–10.Google Scholar .
- 2020. Content-aware unsupervised deep homography estimation. In Proceedings of the European Conference on Computer Vision. 653–669.Google ScholarDigital Library .
Index Terms
- DMHomo: Learning Homography with Diffusion Models
Recommendations
Content-Aware Unsupervised Deep Homography Estimation
Computer Vision – ECCV 2020AbstractHomography estimation is a basic image alignment method in many applications. It is usually conducted by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep ...
An adaptive particle filter tracking method based on homography and common FOV
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation SymposiumIn object tracking, methods based on a particle filter are widely used, but the technique alone often fails in various situations. Sometimes multi-camera systems using homography are tried to solve problems like occlusion. We propose an adaptive ...
Homography-based block motion estimation for video coding of PTZ cameras
We propose a homography-based search (HBS) algorithm for block motion estimation.We use optical flow tracking algorithm to obtain homography between two frames.Adaptive thresholds are adopted in our method to classify different kinds of blocks. Due to ...
Comments