Optimality of Robust Online Learning

Guo, Zheng-Chu; Christmann, Andreas; Shi, Lei

doi:10.1007/s10208-023-09616-9

Zheng-Chu Guo¹,
Andreas Christmann² &
Lei Shi³

312 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we study an online learning algorithm with a robust loss function \(\mathcal {L}_{\sigma }\) for regression over a reproducing kernel Hilbert space (RKHS). The loss function \(\mathcal {L}_{\sigma }\) involving a scaling parameter \(\sigma >0\) can cover a wide range of commonly used robust losses. The proposed algorithm is then a robust alternative for online least squares regression aiming to estimate the conditional mean function. For properly chosen \(\sigma \) and step size, we show that the last iterate of this online algorithm can achieve optimal capacity independent convergence in the mean square distance. Moreover, if additional information on the underlying function space is known, we also establish optimal capacity-dependent rates for strong convergence in RKHS. To the best of our knowledge, both of the two results are new to the existing literature of online learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Article 08 May 2024

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Article 29 March 2024

References

N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68 (1950), 337–404.
Article MathSciNet MATH Google Scholar
F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23 (2007), 52–72.
Article MathSciNet MATH Google Scholar
R. Bessa, V. Miranda, and J. Gama. Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting. IEEE Transactions on Power Systems, 24 (2009), 1657–1666.
Article Google Scholar
M. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63 (1996), 75–104.
Article Google Scholar
G. Blanchard and N. Mücke. Optimal rates for regularization of statistical inverse Learning problems. Foundations of Computational Mathematics, 18 (2018), 971–1013.
Article MathSciNet MATH Google Scholar
L. Bottou, F. E Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60 (2018), 223–311.
Article MathSciNet MATH Google Scholar
A. Caponnetto and E. De Vito. Optimal rates for the regularized least squares algorithm. Foundations of Computational Mathematics, 7 (2007), 331–368.
Article MathSciNet MATH Google Scholar
X. Chen, B. Tang, J. Fan, and X. Guo. Online gradient descent algorithms for functional data learning. Journal of Complexity, page 101635, 2021.
A. Christmann and A. Van Messem, and I. Steinwart. On consistency and robustness properties of support vector machines for heavy-tailed distributions. Statistics and Its Interface, 2 (2009), 331–327.
Article MathSciNet MATH Google Scholar
A. Christmann and I. Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13 (2007), 799–819.
Article MathSciNet MATH Google Scholar
F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge Univesity Press, 2007.
K. De Brabanter, K. Pelckmans, J. De Brabanter, M. Debruyne, J. A. K. Suykens, M. Hubert, and B. De Moor. Robustness of kernel based regression: a comparison of iterative weighting schemes. International Conference on Artificial Neural Networks, (2009), 100–110.
M. Debruyne, A. Christmann, M. Hubert, and J. A. K. Suykens. Robustness of reweighted least squares kernel based regression. Journal of Multivariate Analysis, 101 (2010), 447–463.
Article MathSciNet MATH Google Scholar
E. De Vito, S. Pereverzyev, and L. Rosasco. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10 (2010), 455–479.
Article MathSciNet MATH Google Scholar
A. Dieuleveut and F. Bach. Nonparametric stochastic approximation with large step-sizes. The Annals of Statistics, 44 (2016), 1363–1399.
Article MathSciNet MATH Google Scholar
R. Fair. On the robust estimation of econometric models. Annals of Economic and Social Measurement, 3 (1974), 667–677.
Google Scholar
H. Feng, S. Hou, L. Wei, and D. X. Zhou. CNN models for readability of Chinese texts. Mathematical Foundations of Computing, 5 (2021), 351–362.
Article Google Scholar
Y. Feng, X. Huang, L. Shi, Y. Yang, and J. A. K. Suykens. Learning with the maximum correntropy criterion induced losses for regression. Journal of Machine Learning Research, 16 (2015), 993–1034.
MathSciNet MATH Google Scholar
Y. Feng and Q. Wu. A framework of learning through empirical gain maximization. Neural Computation, 33 (2021), 1656–1697.
Article MathSciNet MATH Google Scholar
S. Ganan and D. McClure. Bayesian image analysis: An application to single photon emission tomography. Journal of the American Statistical Association, (1985), 12–18.
X. Guo, Z. C. Guo, and L. Shi. Capacity dependent analysis for functional online learning algorithms. Applied and Computational Harmonic Analysis, 67 (2023), 1–30.
Z. C. Guo, T. Hu, and L. Shi. Gradient descent for robust kernel based regression. Inverse Problems, 34 (2018), 065009(29pp).
Z. C. Guo, S. B. Lin, and D. X. Zhou. Learning theory of distribued spectral algorithms. Inverse Problems, 33 (2017), 074009(29pp).
Z. C. Guo and L. Shi. Fast and strong convergence of online learning algorithms. Advances in Computational Mathematics, 26 (2019), 1–26.
MathSciNet Google Scholar
F. R. Hampel, E. M. Ronchetti and P. J. Rousseeuw, and W. A. Stahel. Robust statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 1986.
MATH Google Scholar
R. He, W. Zheng, and B. Hu. Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011), 1561–1576.
Article Google Scholar
P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted leastsquares. Communications in Statistics-Theory and Methods, 6 (1977), 813–827.
Article MATH Google Scholar
S. Huang, Y. Feng, and Q. Wu, Learning theory of minimum error entropy under weak moment conditions. Analysis and Applications, 20 (2022), 121–139.
Article MathSciNet MATH Google Scholar
P. Huber. Robust Statistics. Wiley, New York, 1981.
Book MATH Google Scholar
J. Lin and L. Rosasco. Optimal learning for multi-pass stochastic gradient methods. In Advances in Neural Information Processing Systems, 4556–4564, 2016.
W. Liu, P. Pokharel, and J. C. Principe. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing, 55 (2007), 5286–5298.
Article MathSciNet MATH Google Scholar
S. Lu, P. Mathé, and S. V. Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48 (2020), 123–148.
Article MathSciNet MATH Google Scholar
F. Lv and J. Fan, Optimal learning with Gaussians and correntropy loss. Analysis and Applications, 19(2021), 107–124.
Article MathSciNet MATH Google Scholar
R. Maronna, D. Martin, and V. Yohai. Robust Statistics. John Wiley & Sons, Chichester, 2006.
Book MATH Google Scholar
R. A. Maronna and R. D. Martin and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006.
Book MATH Google Scholar
I. Mizera and C. Müller. Breakdown points of Cauchy regression-scale estimators. Statistics & probability letters, 57 (2002), 79–89.
Article MathSciNet MATH Google Scholar
L. Pillaud-Vivien, R. Alessandro, and F. Bach. Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes. In Advances in Neural Information Processing Systems, 8114–8124, 2018.
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (2009), 1574–1609.
Article MathSciNet MATH Google Scholar
A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 449–456, 2012.
G. Raskutti, M. J. Wainwright, and B. Yu. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. Journal of Machine Learning Research, 15 (2014), 335–366.
MathSciNet MATH Google Scholar
L. Rosasco, A, Tacchetti, and S. Villa. Regularization by early stopping for online learning algorithms. Stat, 1050 (2014), 30 pages.
I. Santamaría, P. Pokharel, and J. C. Principe. Generalized correlation function: definition, properties, and application to blind equalization. IEEE Transactions on Signal Processing, 54 (2006), 2187–2197.
Article MATH Google Scholar
B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2018.
S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 1 (2003), 17–41.
Article MathSciNet MATH Google Scholar
S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 26 (2007), 153–172.
Article MathSciNet MATH Google Scholar
S. Smale and D. X. Zhou. Online learning with Markov sampling. Analysis and Applications, 7 (2009), 87–113.
Article MathSciNet MATH Google Scholar
I. Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26 (2017), 225–287.
Article MathSciNet MATH Google Scholar
I. Steinwart and A. Christmann. Support Vector Machines. Springer-Verlag, New York, 2008.
MATH Google Scholar
I. Steinwart, D. R. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In The 22nd Annual Conference on Learning Theory (COLT), 2009.
D. Sun, S. Roth, and M. Black. Secrets of optical flow estimation and their principles. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2432–2439, 2010.
I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning (ICML-13), 1139–1147, 2013.
Y. Yao. On complexity issues of online learning algorithms. IEEE Transactions on Information Theory, 56 (2010), 6470–6481.
Article MathSciNet MATH Google Scholar
Y. Ying and M. Pontil. Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8 (2008), 561–596.
Article MathSciNet MATH Google Scholar
Y. Ying and D. X. Zhou. Unregularized online learning algorithms with general loss functions. Applied and Computational Harmonic Analysis, 42 (2017), 224–244.
Article MathSciNet MATH Google Scholar
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In International Conference on Machine Learning (ICML-04), 919–926, 2004.
X. Zhu, Z. Li, and J. Sun. Expression recognition method combining convolutional features and Transformer. Mathematical Foundations of Computing, 6 (2023), 203–217.
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to the anonymous referees for their careful reading of this paper and suggestions. The work of Zheng-Chu Guo is supported by Zhejiang Provincial Natural Science Foundation of China (Project No. LR20A010001), National Natural Science Foundation of China (Project Nos. U21A20426 and 12271473), and Fundamental Research Funds for the Central Universities (Project No. 2021XZZX001). The work of Andreas Christmann is partially supported by German Science Foundation (DFG) under Grant CH 291/3-1. The work of Lei Shi is supported by the National Natural Science Foundation of China (Project Nos. 12171039 and 12061160462) and Shanghai Science and Technology Program (Project Nos. 21JC1400600 and 20JC1412700).

Author information

Authors and Affiliations

School of Mathematical Sciences, Zhejiang University, Hangzhou, 310058, People’s Republic of China
Zheng-Chu Guo
Department of Mathematics, University of Bayreuth, 95447, Bayreuth, Germany
Andreas Christmann
School of Mathematical Sciences and Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai, 200433, People’s Republic of China
Lei Shi

Authors

Zheng-Chu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Christmann
View author publications
You can also search for this author in PubMed Google Scholar
Lei Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Shi.

Additional information

Communicated by Thomas Strohmer.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, ZC., Christmann, A. & Shi, L. Optimality of Robust Online Learning. Found Comput Math (2023). https://doi.org/10.1007/s10208-023-09616-9

Download citation

Received: 30 December 2021
Revised: 20 January 2023
Accepted: 13 April 2023
Published: 26 July 2023
DOI: https://doi.org/10.1007/s10208-023-09616-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimality of Robust Online Learning

Abstract

Access this article

Similar content being viewed by others

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Optimality of Robust Online Learning

Abstract

Access this article

Similar content being viewed by others

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation