Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks

Siegel, Jonathan W.; Xu, Jinchao

doi:10.1007/s10208-022-09595-3

Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks

Published: 09 November 2022

Volume 24, pages 481–537, (2024)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Jonathan W. Siegel¹ &
Jinchao Xu¹

952 Accesses
15 Citations
4 Altmetric
Explore all metrics

Abstract

In this article, we study approximation properties of the variation spaces corresponding to shallow neural networks with a variety of activation functions. We introduce two main tools for estimating the metric entropy, approximation rates, and n-widths of these spaces. First, we introduce the notion of a smoothly parameterized dictionary and give upper bounds on the nonlinear approximation rates, metric entropy, and n-widths of their absolute convex hull. The upper bounds depend upon the order of smoothness of the parameterization. This result is applied to dictionaries of ridge functions corresponding to shallow neural networks, and they improve upon existing results in many cases. Next, we provide a method for lower bounding the metric entropy and n-widths of variation spaces which contain certain classes of ridge functions. This result gives sharp lower bounds on the \(L^2\)-approximation rates, metric entropy, and n-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU\(^k\) activation functions and sigmoidal activation functions with bounded variation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

Automated machine learning: past, present and future

Article Open access 18 April 2024

Optimization for Deep Learning: An Overview

Article 13 June 2020

References

Fernando Albiac and Nigel John Kalton, Topics in banach space theory, vol. 233, Springer, 2006.
Francis Bach, Breaking the curse of dimensionality with convex neural networks, The Journal of Machine Learning Research 18 (2017), no. 1, 629–681.
Google Scholar
Keith Ball and Alain Pajor, The entropy of convex bodies with “few” extreme points, Proceedings of the 1989 Conference in Banach Spaces at Strob. Austria. Cambridge Univ. Press, 1990.
Andrew R Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information theory 39 (1993), no. 3, 930–945.
Article MathSciNet Google Scholar
Andrew R Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A DeVore, Approximation and learning by greedy algorithms, 36 (2008), no. 1, 64–94.
Article MathSciNet Google Scholar
James H Bramble and SR Hilbert, Estimation of linear functionals on sobolev spaces with application to fourier transforms and spline interpolation, SIAM Journal on Numerical Analysis 7 (1970), no. 1, 112–124.
MathSciNet Google Scholar
Bernd Carl, Entropy numbers, s-numbers, and eigenvalue problems, Journal of Functional Analysis 41 (1981), no. 3, 290–306.
Article MathSciNet Google Scholar
Bernd Carl,Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in banach spaces, Annales de l’institut Fourier, vol. 35, 1985, pp. 79–118.
Article MathSciNet Google Scholar
Bernd Carl, Metric entropy of convex hulls in hilbert spaces, Bulletin of the London Mathematical Society 29 (1997), no. 4, 452–458.
Article MathSciNet Google Scholar
Bernd Carl, Aicke Hinrichs, and Alain Pajor, Gelfand numbers and metric entropy of convex hulls in hilbert spaces, Positivity 17 (2013), no. 1, 171–203.
Article MathSciNet Google Scholar
Bernd Carl, Aicke Hinrichs, and Philipp Rudolph, Entropy numbers of convex hulls in banach spaces and applications, Journal of Complexity 30 (2014), no. 5, 555–587.
Article MathSciNet Google Scholar
Bernd Carl, Ioanna Kyrezi, and Alain Pajor, Metric entropy of convex hulls in banach spaces, Journal of the London Mathematical Society 60 (1999), no. 3, 871–896.
Article MathSciNet Google Scholar
Bernd Carl and Alain Pajor, Gelfand numbers of operators with values in a hilbert space, Inventiones mathematicae 94 (1988), no. 3, 479–504.
Article MathSciNet Google Scholar
Kwok Chiu Chung and Te Hai Yao, On lattices admitting unique lagrange interpolations, SIAM Journal on Numerical Analysis 14 (1977), no. 4, 735–743.
Article MathSciNet Google Scholar
Philippe G Ciarlet and Pierre-Arnaud Raviart, General lagrange and hermite interpolation in rn with applications to finite element methods, Archive for Rational Mechanics and Analysis 46 (1972), no. 3, 177–199.
Article MathSciNet Google Scholar
Albert Cohen, Ronald Devore, Guergana Petrova, and Przemyslaw Wojtaszczyk, Optimal stable nonlinear approximation, Foundations of Computational Mathematics (2021), 1–42.
Ronald A DeVore, Nonlinear approximation, Acta numerica 7 (1998), 51–150.
Ronald A DeVore, Ralph Howard, and Charles Micchelli, Optimal nonlinear approximation, Manuscripta mathematica 63 (1989), no. 4, 469–478.
Article MathSciNet Google Scholar
F Faà Di Bruno,Note sur une nouvelle formule de calcul différentiel, Quarterly J. Pure Appl. Math 1 (1857), no. 359-360, .
David L Donoho, Compressed sensing, IEEE Transactions on information theory 52 (2006), no. 4, 1289–1306.
Article MathSciNet Google Scholar
Richard M Dudley, The sizes of compact subsets of hilbert space and continuity of gaussian processes, Journal of Functional Analysis 1 (1967), no. 3, 290–330.
Article MathSciNet Google Scholar
RM Dudley, Universal donsker classes and metric entropy, Ann. Probab. 15 (1987), no. 4, 1306–1326.
Article MathSciNet Google Scholar
W. E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).
W. E and Stephan Wojtowytsch, Representation formulas and pointwise properties for barron functions., CoRR (2020).
Weinan E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).
David E Edmunds and Jan Lang, Gelfand numbers and widths, Journal of Approximation Theory 166 (2013), 78–84.
Article MathSciNet Google Scholar
Uffe Haagerup, The best constants in the khintchine inequality, Studia Mathematica 70 (1981), 231–283.
Article MathSciNet Google Scholar
Lee K Jones, A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training, The annals of Statistics 20 (1992), no. 1, 608–613.
Article MathSciNet Google Scholar
Jason M Klusowski and Andrew R Barron, Approximation by combinations of relu and squared relu ridge functions with \(\ell ^1\) and \(\ell ^0\) controls, IEEE Transactions on Information Theory 64 (2018), no. 12, 7649–7656.
Article MathSciNet Google Scholar
Andrei Nikolaevich Kolmogorov, On linear dimensionality of topological vector spaces, Doklady Akademii Nauk, vol. 120, Russian Academy of Sciences, 1958, pp. 239–241.
Vera Kurková and Marcello Sanguineti, Bounds on rates of variable-basis and neural-network approximation, IEEE Transactions on Information Theory 47 (2001), no. 6, 2659–2665.
Article MathSciNet Google Scholar
Vera Kurková and Marcello Sanguineti,Comparison of worst case errors in linear and neural network approximation, IEEE Transactions on Information Theory 48 (2002), no. 1, 264–275.
Article MathSciNet Google Scholar
Jan Lang and David Edmunds, Eigenvalues, embeddings and generalised trigonometric functions, vol. 2016, Springer Science & Business Media, 2011.
Michel Ledoux and Michel Talagrand, Probability in banach spaces: isoperimetry and processes, Springer Science & Business Media, 2013.
Wee Sun Lee, Peter L Bartlett, and Robert C Williamson, Efficient agnostic learning of neural networks with bounded fan-in, IEEE Transactions on Information Theory 42 (1996), no. 6, 2118–2132.
Article MathSciNet Google Scholar
Jihao Long and Lei Wu, Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay, arXiv preprint arXiv:2108.04964 (2021).
George G Lorentz, Manfred v Golitschek, and Yuly Makovoz, Constructive approximation: advanced problems, vol. 304, Springer, 1996.
Y Makovoz, Uniform approximation by neural networks, Journal of Approximation Theory 95 (1998), no. 2, 215–228.
Article MathSciNet Google Scholar
Yuly Makovoz, Random approximants and neural networks, Journal of Approximation Theory 85 (1996), no. 1, 98–109.
Article MathSciNet Google Scholar
Jiří Matoušek, Tight upper bounds for the discrepancy of half-spaces, Discrete & Computational Geometry 13 (1995), no. 3, 593–601.
Article MathSciNet Google Scholar
Jiří Matoušek,Improved upper bounds for approximation by zonotopes, Acta Mathematica 177 (1996), no. 1, 55–73.
Article MathSciNet Google Scholar
Jiri Matousek, Geometric discrepancy: An illustrated guide, vol. 18, Springer Science & Business Media, 1999.
RA Nicolaides, On a class of finite elements generated by lagrange interpolation, SIAM Journal on Numerical Analysis 9 (1972), no. 3, 435–445.
Article MathSciNet Google Scholar
Greg Ongie, Rebecca Willett, Daniel Soudry, and Nathan Srebro, A function space view of bounded norm infinite width relu nets: The multivariate case, International Conference on Learning Representations (ICLR 2020), 2019.
Rahul Parhi and Robert D Nowak, Banach space representer theorems for neural networks and ridge splines, arXiv preprint arXiv:2006.05626 (2020).
Rahul Parhi and Robert D Nowak, What kinds of functions do deep neural networks learn? insights from variational spline theory, arXiv preprint arXiv:2105.03361 (2021).
A. Pietsch, Ideals (Algebra), and North-Holland Publishing Company, Operator ideals, Mathematical Studies, North-Holland Publishing Company, 1980.
Google Scholar
Albrecht Pietsch, s-numbers of operators in banach spaces, Studia Mathematica 51 (1974), 201–223.
Article MathSciNet Google Scholar
Albrecht Pietsch,Operator ideals, vol. 16, Deutscher Verl d Wiss, 1978.
Albrecht Pietsch,History of banach spaces and linear operators, Springer Science & Business Media, 2007.
Allan Pinkus, N-widths in approximation theory, vol. 7, Springer Science & Business Media, 2012.
Gilles Pisier, Remarques sur un résultat non publié de b. maurey, Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz") (1981), 1–12.
Gilles Pisier,The volume of convex bodies and banach space geometry, vol. 94, Cambridge University Press, 1999.
R Tyrrell Rockafellar, Convex analysis, no. 28, Princeton university press, 1970.
Jonathan W Siegel and Jinchao Xu, Approximation rates for neural networks with general activation functions, Neural Networks 128 (2020), 313–321.
Article Google Scholar
Jonathan W Siegel and Jinchao Xu,Characterization of the variation spaces corresponding to shallow neural networks, arXiv preprint arXiv:2106.15002 (2021).
Jonathan W Siegel and Jinchao Xu,Improved convergence rates for the orthogonal greedy algorithm, arXiv preprint arXiv:2106.15000 (2021).
Jonathan W Siegel and Jinchao Xu,High-order approximation rates for shallow neural networks with cosine and reluk activation functions, Applied and Computational Harmonic Analysis 58 (2022), 1–26.
Article MathSciNet Google Scholar
Hans Triebel, Interpolation theory, function spaces, Differential Operators (1995).
Jinchao Xu, Error estimates of the finite element method for the 2nd order elliptic equations with discontinuous coefficients, J. Xiangtan University 1 (1982), 1–5.
Google Scholar
Jinchao Xu, Estimate of the convergence rate of finite element solutions to elliptic equations of second order with discontinuous coefficients, arXiv preprint arXiv:1311.4178 (2013).
Jinchao Xu, Finite neuron method and convergence analysis, Communications in Computational Physics 28 (2020), no. 5, 1707–1745.
Article MathSciNet Google Scholar
Yuhong Yang and Andrew Barron, Information-theoretic determination of minimax rates of convergence, Annals of Statistics (1999), 1564–1599.

Download references

Acknowledgements

We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch, Jason Klusowski, and Lei Wu for helpful discussions. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).

Author information

Authors and Affiliations

Department of Mathematics, Pennsylvania State University, University Park, PA, 16802, USA
Jonathan W. Siegel & Jinchao Xu

Authors

Jonathan W. Siegel
View author publications
You can also search for this author in PubMed Google Scholar
Jinchao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan W. Siegel.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Siegel, J.W., Xu, J. Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks. Found Comput Math 24, 481–537 (2024). https://doi.org/10.1007/s10208-022-09595-3

Download citation

Received: 08 September 2021
Revised: 17 July 2022
Accepted: 15 September 2022
Published: 09 November 2022
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10208-022-09595-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Automated machine learning: past, present and future

Optimization for Deep Learning: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Automated machine learning: past, present and future

Optimization for Deep Learning: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation