Skip to main content
Log in

Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

In this article, we study approximation properties of the variation spaces corresponding to shallow neural networks with a variety of activation functions. We introduce two main tools for estimating the metric entropy, approximation rates, and n-widths of these spaces. First, we introduce the notion of a smoothly parameterized dictionary and give upper bounds on the nonlinear approximation rates, metric entropy, and n-widths of their absolute convex hull. The upper bounds depend upon the order of smoothness of the parameterization. This result is applied to dictionaries of ridge functions corresponding to shallow neural networks, and they improve upon existing results in many cases. Next, we provide a method for lower bounding the metric entropy and n-widths of variation spaces which contain certain classes of ridge functions. This result gives sharp lower bounds on the \(L^2\)-approximation rates, metric entropy, and n-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU\(^k\) activation functions and sigmoidal activation functions with bounded variation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fernando Albiac and Nigel John Kalton, Topics in banach space theory, vol. 233, Springer, 2006.

  2. Francis Bach, Breaking the curse of dimensionality with convex neural networks, The Journal of Machine Learning Research 18 (2017), no. 1, 629–681.

    Google Scholar 

  3. Keith Ball and Alain Pajor, The entropy of convex bodies with “few” extreme points, Proceedings of the 1989 Conference in Banach Spaces at Strob. Austria. Cambridge Univ. Press, 1990.

  4. Andrew R Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information theory 39 (1993), no. 3, 930–945.

    Article  MathSciNet  Google Scholar 

  5. Andrew R Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A DeVore, Approximation and learning by greedy algorithms, 36 (2008), no. 1, 64–94.

    Article  MathSciNet  Google Scholar 

  6. James H Bramble and SR Hilbert, Estimation of linear functionals on sobolev spaces with application to fourier transforms and spline interpolation, SIAM Journal on Numerical Analysis 7 (1970), no. 1, 112–124.

    MathSciNet  Google Scholar 

  7. Bernd Carl, Entropy numbers, s-numbers, and eigenvalue problems, Journal of Functional Analysis 41 (1981), no. 3, 290–306.

    Article  MathSciNet  Google Scholar 

  8. Bernd Carl,Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in banach spaces, Annales de l’institut Fourier, vol. 35, 1985, pp. 79–118.

    Article  MathSciNet  Google Scholar 

  9. Bernd Carl, Metric entropy of convex hulls in hilbert spaces, Bulletin of the London Mathematical Society 29 (1997), no. 4, 452–458.

    Article  MathSciNet  Google Scholar 

  10. Bernd Carl, Aicke Hinrichs, and Alain Pajor, Gelfand numbers and metric entropy of convex hulls in hilbert spaces, Positivity 17 (2013), no. 1, 171–203.

    Article  MathSciNet  Google Scholar 

  11. Bernd Carl, Aicke Hinrichs, and Philipp Rudolph, Entropy numbers of convex hulls in banach spaces and applications, Journal of Complexity 30 (2014), no. 5, 555–587.

    Article  MathSciNet  Google Scholar 

  12. Bernd Carl, Ioanna Kyrezi, and Alain Pajor, Metric entropy of convex hulls in banach spaces, Journal of the London Mathematical Society 60 (1999), no. 3, 871–896.

    Article  MathSciNet  Google Scholar 

  13. Bernd Carl and Alain Pajor, Gelfand numbers of operators with values in a hilbert space, Inventiones mathematicae 94 (1988), no. 3, 479–504.

    Article  MathSciNet  Google Scholar 

  14. Kwok Chiu Chung and Te Hai Yao, On lattices admitting unique lagrange interpolations, SIAM Journal on Numerical Analysis 14 (1977), no. 4, 735–743.

    Article  MathSciNet  Google Scholar 

  15. Philippe G Ciarlet and Pierre-Arnaud Raviart, General lagrange and hermite interpolation in rn with applications to finite element methods, Archive for Rational Mechanics and Analysis 46 (1972), no. 3, 177–199.

    Article  MathSciNet  Google Scholar 

  16. Albert Cohen, Ronald Devore, Guergana Petrova, and Przemyslaw Wojtaszczyk, Optimal stable nonlinear approximation, Foundations of Computational Mathematics (2021), 1–42.

  17. Ronald A DeVore, Nonlinear approximation, Acta numerica 7 (1998), 51–150.

  18. Ronald A DeVore, Ralph Howard, and Charles Micchelli, Optimal nonlinear approximation, Manuscripta mathematica 63 (1989), no. 4, 469–478.

    Article  MathSciNet  Google Scholar 

  19. F Faà Di Bruno,Note sur une nouvelle formule de calcul différentiel, Quarterly J. Pure Appl. Math 1 (1857), no. 359-360, .

  20. David L Donoho, Compressed sensing, IEEE Transactions on information theory 52 (2006), no. 4, 1289–1306.

    Article  MathSciNet  Google Scholar 

  21. Richard M Dudley, The sizes of compact subsets of hilbert space and continuity of gaussian processes, Journal of Functional Analysis 1 (1967), no. 3, 290–330.

    Article  MathSciNet  Google Scholar 

  22. RM Dudley, Universal donsker classes and metric entropy, Ann. Probab. 15 (1987), no. 4, 1306–1326.

    Article  MathSciNet  Google Scholar 

  23. W. E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).

  24. W. E and Stephan Wojtowytsch, Representation formulas and pointwise properties for barron functions., CoRR (2020).

  25. Weinan E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).

  26. David E Edmunds and Jan Lang, Gelfand numbers and widths, Journal of Approximation Theory 166 (2013), 78–84.

    Article  MathSciNet  Google Scholar 

  27. Uffe Haagerup, The best constants in the khintchine inequality, Studia Mathematica 70 (1981), 231–283.

    Article  MathSciNet  Google Scholar 

  28. Lee K Jones, A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training, The annals of Statistics 20 (1992), no. 1, 608–613.

    Article  MathSciNet  Google Scholar 

  29. Jason M Klusowski and Andrew R Barron, Approximation by combinations of relu and squared relu ridge functions with \(\ell ^1\) and \(\ell ^0\) controls, IEEE Transactions on Information Theory 64 (2018), no. 12, 7649–7656.

    Article  MathSciNet  Google Scholar 

  30. Andrei Nikolaevich Kolmogorov, On linear dimensionality of topological vector spaces, Doklady Akademii Nauk, vol. 120, Russian Academy of Sciences, 1958, pp. 239–241.

  31. Vera Kurková and Marcello Sanguineti, Bounds on rates of variable-basis and neural-network approximation, IEEE Transactions on Information Theory 47 (2001), no. 6, 2659–2665.

    Article  MathSciNet  Google Scholar 

  32. Vera Kurková and Marcello Sanguineti,Comparison of worst case errors in linear and neural network approximation, IEEE Transactions on Information Theory 48 (2002), no. 1, 264–275.

    Article  MathSciNet  Google Scholar 

  33. Jan Lang and David Edmunds, Eigenvalues, embeddings and generalised trigonometric functions, vol. 2016, Springer Science & Business Media, 2011.

  34. Michel Ledoux and Michel Talagrand, Probability in banach spaces: isoperimetry and processes, Springer Science & Business Media, 2013.

  35. Wee Sun Lee, Peter L Bartlett, and Robert C Williamson, Efficient agnostic learning of neural networks with bounded fan-in, IEEE Transactions on Information Theory 42 (1996), no. 6, 2118–2132.

    Article  MathSciNet  Google Scholar 

  36. Jihao Long and Lei Wu, Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay, arXiv preprint arXiv:2108.04964 (2021).

  37. George G Lorentz, Manfred v Golitschek, and Yuly Makovoz, Constructive approximation: advanced problems, vol. 304, Springer, 1996.

  38. Y Makovoz, Uniform approximation by neural networks, Journal of Approximation Theory 95 (1998), no. 2, 215–228.

    Article  MathSciNet  Google Scholar 

  39. Yuly Makovoz, Random approximants and neural networks, Journal of Approximation Theory 85 (1996), no. 1, 98–109.

    Article  MathSciNet  Google Scholar 

  40. Jiří Matoušek, Tight upper bounds for the discrepancy of half-spaces, Discrete & Computational Geometry 13 (1995), no. 3, 593–601.

    Article  MathSciNet  Google Scholar 

  41. Jiří Matoušek,Improved upper bounds for approximation by zonotopes, Acta Mathematica 177 (1996), no. 1, 55–73.

    Article  MathSciNet  Google Scholar 

  42. Jiri Matousek, Geometric discrepancy: An illustrated guide, vol. 18, Springer Science & Business Media, 1999.

  43. RA Nicolaides, On a class of finite elements generated by lagrange interpolation, SIAM Journal on Numerical Analysis 9 (1972), no. 3, 435–445.

    Article  MathSciNet  Google Scholar 

  44. Greg Ongie, Rebecca Willett, Daniel Soudry, and Nathan Srebro, A function space view of bounded norm infinite width relu nets: The multivariate case, International Conference on Learning Representations (ICLR 2020), 2019.

  45. Rahul Parhi and Robert D Nowak, Banach space representer theorems for neural networks and ridge splines, arXiv preprint arXiv:2006.05626 (2020).

  46. Rahul Parhi and Robert D Nowak, What kinds of functions do deep neural networks learn? insights from variational spline theory, arXiv preprint arXiv:2105.03361 (2021).

  47. A. Pietsch, Ideals (Algebra), and North-Holland Publishing Company, Operator ideals, Mathematical Studies, North-Holland Publishing Company, 1980.

    Google Scholar 

  48. Albrecht Pietsch, s-numbers of operators in banach spaces, Studia Mathematica 51 (1974), 201–223.

    Article  MathSciNet  Google Scholar 

  49. Albrecht Pietsch,Operator ideals, vol. 16, Deutscher Verl d Wiss, 1978.

  50. Albrecht Pietsch,History of banach spaces and linear operators, Springer Science & Business Media, 2007.

  51. Allan Pinkus, N-widths in approximation theory, vol. 7, Springer Science & Business Media, 2012.

  52. Gilles Pisier, Remarques sur un résultat non publié de b. maurey, Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz") (1981), 1–12.

  53. Gilles Pisier,The volume of convex bodies and banach space geometry, vol. 94, Cambridge University Press, 1999.

  54. R Tyrrell Rockafellar, Convex analysis, no. 28, Princeton university press, 1970.

  55. Jonathan W Siegel and Jinchao Xu, Approximation rates for neural networks with general activation functions, Neural Networks 128 (2020), 313–321.

    Article  Google Scholar 

  56. Jonathan W Siegel and Jinchao Xu,Characterization of the variation spaces corresponding to shallow neural networks, arXiv preprint arXiv:2106.15002 (2021).

  57. Jonathan W Siegel and Jinchao Xu,Improved convergence rates for the orthogonal greedy algorithm, arXiv preprint arXiv:2106.15000 (2021).

  58. Jonathan W Siegel and Jinchao Xu,High-order approximation rates for shallow neural networks with cosine and reluk activation functions, Applied and Computational Harmonic Analysis 58 (2022), 1–26.

    Article  MathSciNet  Google Scholar 

  59. Hans Triebel, Interpolation theory, function spaces, Differential Operators (1995).

  60. Jinchao Xu, Error estimates of the finite element method for the 2nd order elliptic equations with discontinuous coefficients, J. Xiangtan University 1 (1982), 1–5.

    Google Scholar 

  61. Jinchao Xu, Estimate of the convergence rate of finite element solutions to elliptic equations of second order with discontinuous coefficients, arXiv preprint arXiv:1311.4178 (2013).

  62. Jinchao Xu, Finite neuron method and convergence analysis, Communications in Computational Physics 28 (2020), no. 5, 1707–1745.

    Article  MathSciNet  Google Scholar 

  63. Yuhong Yang and Andrew Barron, Information-theoretic determination of minimax rates of convergence, Annals of Statistics (1999), 1564–1599.

Download references

Acknowledgements

We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch, Jason Klusowski, and Lei Wu for helpful discussions. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan W. Siegel.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siegel, J.W., Xu, J. Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks. Found Comput Math 24, 481–537 (2024). https://doi.org/10.1007/s10208-022-09595-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-022-09595-3

Keywords

Mathematics Subject Classification

Navigation