Skip to main content
Log in

Abstract

We study the computational cost of recovering a unit-norm sparse principal component \(x \in \mathbb {R}^n\) planted in a random matrix, in either the Wigner or Wishart spiked model (observing either \(W + \lambda xx^\top \) with W drawn from the Gaussian orthogonal ensemble, or N independent samples from \(\mathcal {N}(0, I_n + \beta xx^\top )\), respectively). Prior work has shown that when the signal-to-noise ratio (\(\lambda \) or \(\beta \sqrt{N/n}\), respectively) is a small constant and the fraction of nonzero entries in the planted vector is \(\Vert x\Vert _0 / n = \rho \), it is possible to recover x in polynomial time if \(\rho \lesssim 1/\sqrt{n}\). While it is possible to recover x in exponential time under the weaker condition \(\rho \ll 1\), it is believed that polynomial-time recovery is impossible unless \(\rho \lesssim 1/\sqrt{n}\). We investigate the precise amount of time required for recovery in the “possible but hard” regime \(1/\sqrt{n} \ll \rho \ll 1\) by exploring the power of subexponential-time algorithms, i.e., algorithms running in time \(\exp (n^\delta )\) for some constant \(\delta \in (0,1)\). For any \(1/\sqrt{n} \ll \rho \ll 1\), we give a recovery algorithm with runtime roughly \(\exp (\rho ^2 n)\), demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the \(\exp (\rho n)\)-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. We will also consider stronger notions of recovery: strong recovery is \(\langle {{\hat{x}}}, x \rangle ^2 \rightarrow 1\) as \(n \rightarrow \infty \) and exact recovery is \({{\hat{x}}} = x\) with probability \(1-o(1)\).

  2. We analyze our algorithms for a more general set of assumptions on x; see Definition 2.1.

  3. We use \(A \lesssim B\) to denote \(A \le CB\) for some constant C, and use \(A \ll B\) to denote \(A \le B/\textrm{polylog}(n)\).

References

  1. E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.

    Google Scholar 

  2. A. A. Amini and M. J. Wainwright. High-dimensional analysis of semidefinite relaxations for sparse principal components. In International Symposium on Information Theory, pages 2454–2458. IEEE, 2008.

  3. J. Baik, G. Ben Arous, and S. Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. The Annals of Probability, 33(5):1643–1697, 2005.

  4. J. Baik and J. W. Silverstein. Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis, 97(6):1382–1408, 2006.

    Article  MATH  Google Scholar 

  5. A. S. Bandeira, D. Kunisky, and A. S. Wein. Computational hardness of certifying bounds on constrained PCA problems. In 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, page 78. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.

  6. J. Banks, C. Moore, J. Neeman, and P. Netrapalli. Information-theoretic thresholds for community detection in sparse networks. In Conference on Learning Theory, pages 383–416, 2016.

  7. J. Banks, C. Moore, R. Vershynin, N. Verzelen, and J. Xu. Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. IEEE Transactions on Information Theory, 64(7):4872–4894, 2018.

    Article  MATH  Google Scholar 

  8. B. Barak, S. Hopkins, J. Kelner, P. K. Kothari, A. Moitra, and A. Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019.

    Article  MATH  Google Scholar 

  9. F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.

    Article  MATH  Google Scholar 

  10. Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Conference on learning theory, pages 1046–1066. PMLR, 2013.

  11. Q. Berthet and P. Rigollet. Optimal detection of sparse principal components in high dimension. The Annals of Statistics, 41(4):1780–1815, 2013.

    Article  MATH  Google Scholar 

  12. V. Bhattiprolu, V. Guruswami, and E. Lee. Sum-of-squares certificates for maxima of random tensors on the sphere. arXiv:1605.00903, 2016.

  13. V. V. Bhattiprolu, M. Ghosh, V. Guruswami, E. Lee, and M. Tulsiani. Multiplicative approximations for polynomial optimization over the unit sphere. In Electronic Colloquium on Computational Complexity (ECCC), volume 23, page 1, 2016.

  14. M. Brennan and G. Bresler. Optimal average-case reductions to sparse PCA: From weak assumptions to strong hardness. In Conference on Learning Theory, pages 469–470. PMLR, 2019.

  15. M. Brennan and G. Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.

  16. M. Brennan, G. Bresler, and W. Huleihel. Reducibility and computational lower bounds for problems with planted sparse structure. In Conference On Learning Theory, pages 48–166. PMLR, 2018.

  17. G. Bresler, S. M. Park, and M. Persu. Sparse PCA from sparse linear regression. In Advances in Neural Information Processing Systems, pages 10942–10952, 2018.

  18. T. T. Cai, Z. Ma, and Y. Wu. Sparse PCA: Optimal rates and adaptive estimation. The Annals of Statistics, 41(6):3074–3110, 2013.

    Article  MATH  Google Scholar 

  19. M. Capitaine, C. Donati-Martin, and D. Féral. The largest eigenvalues of finite rank deformation of large wigner matrices: convergence and nonuniversality of the fluctuations. The Annals of Probability, 37(1):1–47, 2009.

    Article  MATH  Google Scholar 

  20. A. d’Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. Lanckriet. A direct formulation for sparse PCA using semidefinite programming. In Advances in neural information processing systems, pages 41–48, 2005.

  21. A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.

    Article  Google Scholar 

  22. A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Inference and phase transitions in the detection of modules in sparse networks. Physical Review Letters, 107(6):065701, 2011.

    Article  Google Scholar 

  23. Y. Deshpande, E. Abbe, and A. Montanari. Asymptotic mutual information for the binary stochastic block model. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 185–189. IEEE, 2016.

  24. Y. Deshpande and A. Montanari. Information-theoretically optimal sparse PCA. In 2014 IEEE International Symposium on Information Theory, pages 2197–2201. IEEE, 2014.

  25. Y. Deshpande and A. Montanari. Sparse PCA via covariance thresholding. In Advances in Neural Information Processing Systems, pages 334–342, 2014.

  26. Y. Deshpande and A. Montanari. Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In Conference on Learning Theory, pages 523–562, 2015.

  27. M. Dia, N. Macris, F. Krzakala, T. Lesieur, and L. Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems, pages 424–432, 2016.

  28. A. d’Aspremont, F. Bach, and L. E. Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9(Jul):1269–1294, 2008.

  29. A. El Alaoui and F. Krzakala. Estimation in the spiked wigner model: A short proof of the replica formula. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 1874–1878. IEEE, 2018.

  30. A. El Alaoui, F. Krzakala, and M. Jordan. Fundamental limits of detection in the spiked wigner model. The Annals of Statistics, 48(2):863–885, 2020.

    Article  MATH  Google Scholar 

  31. D. Féral and S. Péché. The largest eigenvalue of rank one deformation of large wigner matrices. Communications in mathematical physics, 272(1):185–228, 2007.

    Article  MATH  Google Scholar 

  32. G. Holtzman, A. Soffer, and D. Vilenchik. A greedy anytime algorithm for sparse PCA. In Conference on Learning Theory, pages 1939–1956. PMLR, 2020.

  33. S. Hopkins. Statistical Inference and the Sum of Squares Method. PhD thesis, Cornell University, 2018.

  34. S. B. Hopkins, P. K. Kothari, A. Potechin, P. Raghavendra, T. Schramm, and D. Steurer. The power of sum-of-squares for detecting hidden structures. In 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2017.

  35. S. B. Hopkins, J. Shi, and D. Steurer. Tensor principal component analysis via sum-of-square proofs. In Conference on Learning Theory, pages 956–1006, 2015.

  36. S. B. Hopkins and D. Steurer. Efficient bayesian estimation from few samples: community detection and related problems. In 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017.

  37. A. Javanmard, A. Montanari, and F. Ricci-Tersenghi. Phase transitions in semidefinite relaxations. Proceedings of the National Academy of Sciences, 113(16):E2218–E2223, 2016.

    Article  MATH  Google Scholar 

  38. M. Jerrum. Large cliques elude the Metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992.

    Article  MATH  Google Scholar 

  39. I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. The Annals of statistics, 29(2):295–327, 2001.

    Article  MATH  Google Scholar 

  40. I. M. Johnstone and A. Y. Lu. Sparse principal components analysis. Unpublished manuscript, 2004.

  41. I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009.

    Article  MATH  Google Scholar 

  42. A. Knowles and J. Yin. The isotropic semicircle law and deformation of wigner matrices. Communications on Pure and Applied Mathematics, 66(11):1663–1749, 2013.

    Article  MATH  Google Scholar 

  43. P. Koiran and A. Zouzias. Hidden cliques and the certification of the restricted isometry property. IEEE transactions on information theory, 60(8):4999–5006, 2014.

    Article  MATH  Google Scholar 

  44. R. Krauthgamer, B. Nadler, and D. Vilenchik. Do semidefinite relaxations solve sparse PCA up to the information limit? The Annals of Statistics, 43(3):1300–1322, 2015.

    Article  MATH  Google Scholar 

  45. F. Krzakala, J. Xu, and L. Zdeborová. Mutual information in rank-one matrix estimation. In 2016 IEEE Information Theory Workshop (ITW), pages 71–75. IEEE, 2016.

  46. D. Kunisky, A. S. Wein, and A. S. Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. arXiv:1907.11636, 2019.

  47. B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.

  48. M. Lelarge and L. Miolane. Fundamental limits of symmetric low-rank matrix estimation. Probability Theory and Related Fields, 173(3-4):859–929, 2019.

    Article  MATH  Google Scholar 

  49. T. Lesieur, F. Krzakala, and L. Zdeborová. MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 680–687. IEEE, 2015.

  50. T. Lesieur, F. Krzakala, and L. Zdeborová. Phase transitions in sparse PCA. In 2015 IEEE International Symposium on Information Theory (ISIT), pages 1635–1639. IEEE, 2015.

  51. T. Ma and A. Wigderson. Sum-of-squares lower bounds for sparse PCA. In Advances in Neural Information Processing Systems, pages 1612–1620, 2015.

  52. F. McSherry. Spectral partitioning of random graphs. In Proceedings 2001 IEEE International Conference on Cluster Computing, pages 529–537. IEEE, 2001.

  53. R. Meka, A. Potechin, and A. Wigderson. Sum-of-squares lower bounds for planted clique. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 87–96. ACM, 2015.

  54. L. Miolane. Fundamental limits of low-rank matrix estimation: the non-symmetric case. arXiv:1702.00473, 2017.

  55. L. Miolane. Phase transitions in spiked matrix estimation: information-theoretic analysis. arXiv:1806.04343, 2018.

  56. B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA: Exact and greedy algorithms. In Advances in neural information processing systems, pages 915–922, 2006.

  57. A. Montanari, D. Reichman, and O. Zeitouni. On the limitation of spectral methods: From the gaussian hidden clique problem to rank-one perturbations of gaussian tensors. In Advances in Neural Information Processing Systems, pages 217–225, 2015.

  58. C. Moore. The computer science and physics of community detection: Landscapes, phase transitions, and hardness. arXiv:1702.00467, 2017.

  59. B. Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36(6):2791–2817, 2008.

    Article  MATH  Google Scholar 

  60. A. Onatski, M. J. Moreira, and M. Hallin. Asymptotic power of sphericity tests for high-dimensional data. The Annals of Statistics, 41(3):1204–1231, 2013.

    Article  MATH  Google Scholar 

  61. D. Paul. Asymptotics of the leading sample eigenvalues for a spiked covariance model. Preprint, 2004.

  62. D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, pages 1617–1642, 2007.

  63. D. Paul and I. M. Johnstone. Augmented sparse principal component analysis for high dimensional data. arXiv:1202.1242, 2012.

  64. S. Péché. The largest eigenvalue of small rank perturbations of hermitian random matrices. Probability Theory and Related Fields, 134(1):127–173, 2006.

    Article  MATH  Google Scholar 

  65. A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Optimality and sub-optimality of PCA for spiked random matrices and synchronization. arXiv:1609.05573, 2016.

  66. A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Message-passing algorithms for synchronization problems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275–2322, 2018.

    Article  MATH  Google Scholar 

  67. A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Optimality and sub-optimality of PCA I: Spiked random matrix models. The Annals of Statistics, 46(5):2416–2451, 2018.

    Article  MATH  Google Scholar 

  68. A. Pizzo, D. Renfrew, and A. Soshnikov. On finite rank deformations of wigner matrices. In Annales de l’IHP Probabilités et statistiques, volume 49, pages 64–94, 2013.

    MATH  Google Scholar 

  69. P. Raghavendra, S. Rao, and T. Schramm. Strongly refuting random CSPs below the spectral threshold. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 121–131. ACM, 2017.

  70. P. Raghavendra, T. Schramm, and D. Steurer. High dimensional estimation via sum-of-squares proofs. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro, pages 3389–3423. World Scientific, 2018.

  71. E. Richard and A. Montanari. A statistical model for tensor PCA. In Advances in Neural Information Processing Systems, pages 2897–2905, 2014.

  72. A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20–36, 2011.

    Article  MATH  Google Scholar 

  73. A. Singer and Y. Shkolnisky. Three-dimensional structure determination from common lines in cryo-EM by eigenvectors and semidefinite programming. SIAM journal on imaging sciences, 4(2):543–572, 2011.

    Article  MATH  Google Scholar 

  74. R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027, 2010.

  75. V. Vu. A simple SVD algorithm for finding hidden partitions. Combinatorics, Probability and Computing, 27(1):124–140, 2018.

    Article  MATH  Google Scholar 

  76. V. Vu and J. Lei. Minimax rates of estimation for sparse PCA in high dimensions. In Artificial intelligence and statistics, pages 1278–1286, 2012.

  77. T. Wang, Q. Berthet, and R. J. Samworth. Statistical and computational trade-offs in estimation of sparse principal components. The Annals of Statistics, 44(5):1896–1930, 2016.

    Article  MATH  Google Scholar 

  78. A. S. Wein, A. El Alaoui, and C. Moore. The Kikuchi hierarchy and tensor PCA. In 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1446–1468. IEEE, 2019.

  79. D. M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534, 2009.

    Article  MATH  Google Scholar 

  80. A. Zhang and D. Xia. Tensor SVD: Statistical and computational limits. IEEE Transactions on Information Theory, 64(11):7311–7338, 2018.

    Article  MATH  Google Scholar 

  81. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Samuel B. Hopkins, Philippe Rigollet, and Eliran Subag for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander S. Wein.

Additional information

Communicated by Hans Munthe-Kaas.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Partially supported by NSF Grant DMS-1712730. Partially supported by NSF Grants DMS-1712730 and DMS-1719545. Partially supported by NSF Grant DMS-1712730 and by the Simons Collaboration on Algorithms and Geometry. Most of this work was done while ASB was with the Department of Mathematics at the Courant Institute of Mathematical Sciences, and the Center for Data Science, at New York University; and partially supported by NSF Grants DMS-1712730 and DMS-1719545, and by a Grant from the Sloan Foundation.

Appendices

The Wigner Model

1.1 Main Results

We now state our algorithms and results for the Wigner model. These are very similar to the Wishart case, so we omit some of the discussion.

Definition A.1

(Spiked Wigner model) The spiked Wigner model with parameters \(n\in \mathbb {N}_+\), \(\lambda \ge 0\), and planted signal \(x \in \mathbb {R}^n\) is defined as follows.

  • Under \(\mathbb {P}_n = \mathbb {P}_{n,\lambda }\), we observe the matrix \(Y = W + \lambda xx^\top \), where \(W\sim \textsf{GOE}(n)\).

  • Under \(\mathbb {Q}_n\), we observe the matrix \(Y \sim \textsf{GOE}(n)\).

figure c

Remark A.2

(Runtime) As in the Wishart case (see Remark 2.5), the runtime is \(n^{O(\ell )}\). The same holds for Algorithm 4 below.

Theorem A.3

(Wigner detection) Consider the spiked Wigner model with an arbitrary \((\rho ,A)\)-sparse signal x. Let Y be drawn from either \(\mathbb {P}_n\) or \(\mathbb {Q}_n\), and let \(f_n\) be the output of Algorithm 3. Suppose

$$\begin{aligned} \rho \le \frac{\lambda ^2}{36A^4}\frac{1}{\log n}. \end{aligned}$$
(40)

Let \(\ell \) be any integer in the interval

$$\begin{aligned} \ell \in \left[ \frac{36 A^4}{\lambda ^2}\rho ^2 n\log n,\; \rho n\right] , \end{aligned}$$
(41)

which in nonempty due to (40). Then the total failure probability of Algorithm 3 satisfies

$$\begin{aligned} \mathbb {P}_n[f_n = {{\texttt {\textit{q}}}}] + \mathbb {Q}_n[f_n = {{\texttt {\textit{p}}}}] \le 2\exp \left( -\frac{\lambda ^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) \le 2n^{-9 \ell /8}, \end{aligned}$$

where the last inequality follows from (41).

Remark A.4

Since our lower bounds are against the class of low-degree algorithms, it is natural to ask whether our algorithms fall into this class. While our test statistic T is not a polynomial function of Y, we can instead take as a proxy the degree-2k polynomial \(P(Y) = \sum _{v \in \mathcal {I}_{n,\ell }} (v^\top Y v)^{2k}\) for some choice of k. Our analysis can be adapted to show that P can be used to solve strong detection under essentially the same conditions as Theorem A.3, provided \(k > rsim \ell \log n\). Note that (up to log factors) this matches the correspondence between runtime and degree in Conjecture 1.5.

figure d

For technical reasons, our first step is to fictitiously “split” the data into two independent copies \(Y'\) and \(Y''\). Note that

$$\begin{aligned} Y' = \frac{\lambda }{\sqrt{2}}xx^{\top }+ \frac{W+{{\tilde{W}}}}{\sqrt{2}}\quad \text {and}\quad Y'' = \frac{\lambda }{\sqrt{2}}xx^{\top }+ \frac{W-{{\tilde{W}}}}{\sqrt{2}}. \end{aligned}$$

Since \(W' {:}{=}\frac{W+{{\tilde{W}}}}{\sqrt{2}}\) and \(W'' {:}{=}\frac{W-{{\tilde{W}}}}{\sqrt{2}}\) are independent \(\textsf{GOE}(n)\) matrices, \(Y'\) and \(Y''\) are distributed as independent observations drawn from \(\mathbb {P}_n\) with the same planted signal x and with effective signal-to-noise ratio \(\bar{\lambda } = \lambda /\sqrt{2}\).

Theorem A.5

(Wigner support and sign recovery) Consider the planted spiked Wishart model \(\mathbb {P}_n\) with an arbitrary \((\rho ,A)\)-sparse signal x. Suppose

$$\begin{aligned} \rho \le \frac{\lambda ^2}{338A^4}\frac{1}{\log n}. \end{aligned}$$
(42)

Let \(\ell \) be any integer in the interval

$$\begin{aligned} \ell \in \left[ \frac{338 A^4}{\lambda ^2}\rho ^2 n\log n,\; \rho n\right] , \end{aligned}$$
(43)

which is nonempty due to (42). Then the failure probability of Algorithm 4 satisfies

$$\begin{aligned} 1-\mathbb {P}_n\left[ \textrm{supp}({\bar{x}}) = \textrm{supp}(x),\ \textrm{sign}({\bar{x}}) = \pm \textrm{sign}(x)\right) \le 4\exp \left( -\frac{\lambda ^2}{288A^4}\frac{\ell }{\rho ^2 n}\right) \le 4n^{-169/144}, \end{aligned}$$

where the last inequality follows from (43).

As in the Wishart case, once we have recovered the support, there is a standard polynomial-time spectral method to estimate x.

Theorem A.6

(Wigner recovery) Consider the planted spiked Wigner model \(\mathbb {P}_n\) with an arbitrary \((\rho ,A)\)-sparse signal x. Suppose we have access (e.g., via Algorithm 4) to \(\mathcal {I}= \textrm{supp}(x) \subset [n]\). Write \(P_{\mathcal {I}} = \sum _{i\in \mathcal {I}}e_i e_i^\top \) and \(Y_{\mathcal {I}} = P_{\mathcal {I}}YP_{\mathcal {I}}^\top \). Let \({\tilde{x}}\) denote the unit-norm eigenvector corresponding to the maximum eigenvalue of \(Y_{\mathcal {I}}\). Then for any \(\epsilon \in (\frac{4\sqrt{2\rho }}{\lambda },1)\),

$$\begin{aligned} \mathbb {P}_n\left[ \langle {\tilde{x}},x\rangle ^2 \le 1-\epsilon \right] \le 4\exp \left[ -\frac{n}{16}\left( \lambda \epsilon -4\sqrt{2\rho }\right) ^2\right] . \end{aligned}$$

Remark A.7

In the regime we are interested in, \(n \rightarrow \infty \) with (42) satisfied, so that \(\sqrt{\rho }/\lambda \rightarrow 0\). In this case, the conclusion of Theorem A.6 gives \(\langle {{\tilde{x}}},x \rangle ^2 > 1 - o(1)\) with high probability, upon choosing for example \(\epsilon = \frac{8\sqrt{2\rho }}{\lambda }\).

We also have the following results on the behavior of the low-degree likelihood ratio.

Theorem A.8

(Boundedness of LDLR for large \(\rho \)) Under the spiked Wigner model with prior \(\mathcal {X}= \mathcal {X}_n^\rho \), suppose \(D_n = o(n)\). If one of the following holds for sufficiently large n:

  1. (a)

    \(\limsup _{n\rightarrow \infty }\lambda _n < 1\) and

    $$\begin{aligned} \rho _n\ge \max \left( 1,\sqrt{\frac{1}{6\log (1/\lambda _n)}}\right) \sqrt{\frac{D_n}{n}}\text {, or} \end{aligned}$$
    (44)
  2. (b)

    \(\limsup _{n\rightarrow \infty }\lambda _n < 1/\sqrt{3}\) and

    $$\begin{aligned} \rho _n\ge \lambda _n\sqrt{\frac{D_n}{n}}, \end{aligned}$$
    (45)

then, as \(n\rightarrow \infty \), \(\Vert L_{n,\lambda ,\mathcal {X}}^{\le D}\Vert = O(1)\).

Theorem A.9

(Divergence of LDLR for small \(\rho \)) Under the spiked Wigner model with prior \(\mathcal {X}= \mathcal {X}_n^\rho \), suppose \(D_n = \omega (1)\) and \(D_n = o(n)\). If one of the following holds:

  1. (a)

    \(\liminf _{n\rightarrow \infty }\lambda _n > 1\), or

  2. (b)

    \(\limsup _{n\rightarrow \infty }\lambda _n < 1\), \(|\log \lambda _n| = o(\sqrt{D_n})\) and for sufficiently large n,

    $$\begin{aligned} \rho _n < C\lambda _n\log ^{-2}(1/\lambda _n)\sqrt{\frac{D_n}{n}} \end{aligned}$$

    where C is an absolute constant,

then, as \(n\rightarrow \infty \), \(\Vert L_{n,\lambda ,\mathcal {X}}^{\le D}\Vert = \omega (1)\).

1.2 Proofs for Subexponential-Time Algorithms

Proof of Theorem A.3 (Detection)

For simplicity we denote \(t = \frac{\lambda \ell ^2}{2A^2\rho n}\). Under \({\mathbb {P}}_n\), when \({{\bar{v}}}\) correctly guesses \(\ell \) entries in the support of x with correct signs (which requires \(\ell \le \rho n\)),

$$\begin{aligned} {{{\bar{v}}}}^\top Y {{\bar{v}}} = {{{\bar{v}}}}^\top W{{\bar{v}}} + \lambda \langle {{\bar{v}}},x\rangle ^2, \end{aligned}$$

where \({{{\bar{v}}}}^\top W{{\bar{v}}} \sim {\mathcal {N}}(0,\ell ^2/n)\). Note that

$$\begin{aligned} \lambda \langle {{\bar{v}}},x\rangle ^2 \ge \frac{\lambda \ell ^2}{A^2\rho n}=2t. \end{aligned}$$

Therefore, a standard Gaussian tail bound gives

$$\begin{aligned} \mathbb {P}_n\left[ T< t\right]&\le \mathbb {P}_n\left[ {{{\bar{v}}}}^\top Y{{\bar{v}}} < t\right] \\ {}&\le \Pr \left[ {\mathcal {N}}(0,\ell ^2/n) > t\right] \\&\le \exp \left( -\frac{n}{2\ell ^2} \left( \frac{\lambda \ell ^2}{2A^2\rho n}\right) ^2\right) \\ {}&= \exp \left( -\frac{\lambda ^2}{8A^4}\frac{\ell ^2}{\rho ^2 n}\right) . \end{aligned}$$

Under \({\mathbb {Q}}_n\), for each fixed \(v \in {\mathcal {I}}_{n,\ell }\), we have

$$\begin{aligned} v^\top Y v \sim {\mathcal {N}}(0,2\ell ^2/n). \end{aligned}$$

By the same tail bound,

$$\begin{aligned} \mathbb {Q}_n\left[ v^\top Y v \ge t\right] \le \exp \left( -\frac{nt^2}{4\ell ^2}\right) . \end{aligned}$$

Now, by a union bound over \(v \in {\mathcal {I}}_{n,\ell }\),

$$\begin{aligned} \mathbb {Q}_n\left[ T \ge t\right]&\le |{\mathcal {I}}_{n,\ell }| \exp \left( -\frac{nt^2}{4\ell ^2}\right) \\&= \left( {\begin{array}{c}n\\ \ell \end{array}}\right) 2^\ell \exp \left( -\frac{nt^2}{4\ell ^2}\right) \\&\le \exp \left( \ell \log (2n) -\frac{nt^2}{4\ell ^2}\right) . \end{aligned}$$

Under the condition

$$\begin{aligned} \frac{nt^2}{8\ell ^2}\ge \ell \log (2n)\ \Leftarrow \ \rho < \frac{\lambda }{6A^2}{\sqrt{\frac{\ell }{n\log n}}}, \end{aligned}$$

which is equivalent to the interval for \(\ell \) given in (41), we have

$$\begin{aligned} \mathbb {Q}_n\left[ T \ge t\right] \le \exp \left( -\frac{nt^2}{8\ell ^2}\right) = \exp \left( -\frac{\lambda ^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) . \end{aligned}$$

Therefore, by thresholding T at t, under the condition

$$\begin{aligned} \frac{\ell }{n}\le \rho \le \frac{\lambda }{6A^2} \sqrt{\frac{\ell }{n \log n}}, \end{aligned}$$
(46)

we can distinguish \(\mathbb {P}_n\) and \(\mathbb {Q}_n\) with total failure probability at most

$$\begin{aligned} \mathbb {P}_n\left[ T < t\right] + \mathbb {Q}_n\left[ T \ge t\right]\le & {} \exp \left( -\frac{\lambda ^2}{8A^4}\frac{\ell ^2}{\rho ^2 n}\right) +\exp \left( -\frac{\lambda ^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) \\ {}\le & {} 2\exp \left( -\frac{\lambda ^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) , \end{aligned}$$

completing the proof. \(\square \)

Proof of Theorem A.5 (Support and Sign Recovery)

First, we show that \(v^*\) has significant overlap with the support of x. From the analysis of the detection algorithm, provided (46) holds, with probability at least \(1-2\exp \left( -\frac{\bar{\lambda }^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) \) we have

$$\begin{aligned} \frac{\bar{\lambda }\ell ^2}{2A^2\rho n} \le {v^*}^\top Y' v^* = \bar{\lambda } \langle v^*,x \rangle ^2 + {v^*}^\top W' v^*. \end{aligned}$$

where \({v^*}^\top W' v^* \sim \mathcal {N}(0,2\ell ^2/n)\). Therefore, for n sufficiently large,

$$\begin{aligned}&\mathbb {P}_n\left[ \langle v^*,x \rangle ^2 \ge \frac{\ell ^2}{4A^2\rho n}\right] \\ {}&\quad \ge \left( 1-2\exp \left( -\frac{\bar{\lambda }^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) \right) \left( 1- \Pr \left[ \mathcal {N}(0,2\ell ^2/n) \ge \frac{\bar{\lambda }\ell ^2}{4A^2\rho n}\right] \right) \\&\quad \ge 1-2\exp \left( -\frac{\bar{\lambda }^2}{32A^4}\frac{\ell ^2}{\rho ^2 n}\right) -\exp \left( -\frac{\bar{\lambda }^2}{64A^4}\frac{\ell ^2}{\rho ^2 n}\right) \\&\quad \ge 1-3\exp \left( -\frac{\bar{\lambda }^2}{64A^4}\frac{\ell ^2}{\rho ^2 n}\right) . \end{aligned}$$

We now fix \(v^*\) satisfying the above lower bound on \(\langle v^*,x \rangle ^2\). From this point onward, we will only use the second copy \(Y''\) of our data; it is important here that \(Y''\) is independent from \(v^*\). We will that x is successfully recovered by thresholding the entries of \(z = Y''v^*\). Entrywise, we have

$$\begin{aligned} z_i = \bar{\lambda } x_i \langle v^*,x \rangle + e_i^\top W''v^*. \end{aligned}$$

For all \(i \in \textrm{supp}(x)\),

$$\begin{aligned} |\bar{\lambda } x_i \langle v^*,x \rangle | \ge \bar{\lambda } \frac{1}{A\sqrt{\rho n}}\cdot \frac{\ell }{2A\sqrt{\rho n}} = \frac{\bar{\lambda } \ell }{2A^2\rho n}. \end{aligned}$$

For simplicity we denote \(s = \frac{\bar{\lambda }\ell }{2A^2\rho n}\) and \(\mu = \frac{1}{3}\). Note that for all \(i\in [n]\), \(e_i^\top W'' v^* \sim {\mathcal {N}}(0,\Vert v\Vert ^2/n) = {\mathcal {N}}(0,\ell /n)\) and therefore

$$\begin{aligned} \mathbb {P}_n\left[ |e_i^\top W'' v^*| \ge \mu s\right] \le 2 \exp \left( -\frac{n\mu ^2 s^2}{2\ell }\right) . \end{aligned}$$
(47)

By a union bound over all \(i \in [n]\),

$$\begin{aligned} \mathbb {P}_n\left[ |e_i^\top W'' v^*| \le \mu s \text { for all } i\right]&\ge 1 - 2n \exp \left( -\frac{n\mu ^2 s^2}{2\ell }\right) \\&\ge 1 - \exp \left( \log (2n) -\frac{n\mu ^2 s^2}{2\ell }\right) \\&\ge 1 - \exp \left( -\frac{n\mu ^2 s^2}{4\ell }\right) \\&= 1-\exp \left( -\frac{\bar{\lambda }^2}{144A^4}\frac{\ell }{\rho ^2 n}\right) \end{aligned}$$

under the condition

$$\begin{aligned} \frac{n\mu ^2 s^2}{4\ell }\ge \log (2n)\ \Leftarrow \ \rho \le \frac{\bar{\lambda }}{13A^2} \sqrt{\frac{\ell }{n \log n}} = \frac{\lambda }{13\sqrt{2}A^2} \sqrt{\frac{\ell }{n \log n}}, \end{aligned}$$

which, combined with (46), is equivalent membership in the interval for \(\ell \) that we are considering per (43). Therefore, with probability at least

$$\begin{aligned} 1-3\exp \left( -\frac{\bar{\lambda }^2}{64A^4}\frac{\ell ^2}{\rho ^2 n}\right) -\exp \left( -\frac{\bar{\lambda }^2}{144A^4}\frac{\ell }{\rho ^2 n}\right) \ge 1-4\exp \left( -\frac{\lambda ^2}{288A^4}\frac{\ell }{\rho ^2 n}\right) \end{aligned}$$

for all \(j\in [n]\),

$$\begin{aligned} j\in \textrm{supp}(x) \text { if and only if } |z_j|\ge \frac{s}{2} \end{aligned}$$

and

$$\begin{aligned} \textrm{sign}(z_j) = \textrm{sign}(x_j \langle v^*,x\rangle ). \end{aligned}$$

Thus, we find that thresholding the entries of z at s/2 successfully recovers the support and signs of x, completing the proof. \(\square \)

Proof of Theorem A.6 (Full Recovery)

Since \(Y_{\mathcal {I}}{\tilde{x}} = \lambda _{\max }(Y_{\mathcal {I}}){\tilde{x}}\), we must have \(\textrm{supp}({\tilde{x}})\subset \mathcal {I}\). Denote \(W_{\mathcal {I}} = P_{\mathcal {I}}WP_{\mathcal {I}}^\top \) and \({\bar{W}}_{\mathcal {I}}\) the \(\ell \times \ell \) submatrix of \(W_{\mathcal {I}}\) with rows and columns indexed by \(\mathcal {I}\) (the only nonzero rows and columns). Now, the variational description of the leading eigenvector yields

$$\begin{aligned} {\tilde{x}}^\top W_{\mathcal {I}} {\tilde{x}}+\lambda \langle {\tilde{x}},x\rangle ^2 = {\tilde{x}}^\top Y_{\mathcal {I}}{\tilde{x}} \ge x^\top Y_{\mathcal {I}} x = x^\top W_{\mathcal {I}}x +\lambda . \end{aligned}$$

Therefore,

$$\begin{aligned} \langle {\tilde{x}},x\rangle ^2 \ge 1- \frac{1}{\lambda }({\tilde{x}}^\top W_{\mathcal {I}} {\tilde{x}}-x^\top W_{\mathcal {I}} x) \ge 1-\frac{1}{\lambda }(\lambda _{\max }({\bar{W}}_{\mathcal {I}})+\lambda _{\max }(-{\bar{W}}_{\mathcal {I}})). \end{aligned}$$

Note that \({\bar{W}}_{\mathcal {I}}\) has the same law as \(({\bar{G}}+{\bar{G}}^\top ) / \sqrt{2n}\), where \({\bar{G}}\) is an \(\rho n\times \rho n\) matrix whose entries are independent standard normal random variables. Now, for any \(\epsilon > \frac{4\sqrt{2\rho }}{\lambda }\), we have \(\frac{\sqrt{2n}\lambda \epsilon }{4} > 2\sqrt{\rho n}\), a standard singular value estimate for Gaussian matrices (see [74], Corollary 5.35) gives

$$\begin{aligned} \mathbb {P}_n\left[ \langle {\bar{x}},x\rangle ^2 \le 1-\epsilon \right]&\le \Pr \left[ \lambda _{\max }({\bar{W}}_{\mathcal {I}})+\lambda _{\max }(-{\bar{W}}_{\mathcal {I}}) \ge \lambda \epsilon \right] \\&\le 2\Pr \left[ \lambda _{\max }({\bar{W}}_{\mathcal {I}}) \ge \frac{\lambda \epsilon }{2}\right] \le 2\Pr \left[ \sigma _{\max }({\bar{G}}) \ge \frac{\sqrt{2n}\lambda \epsilon }{4}\right] \\&\le 4\exp \left[ -\frac{n}{16}\left( \lambda \epsilon -4\sqrt{2\rho }\right) ^2\right] , \end{aligned}$$

which concludes the proof. \(\square \)

1.3 Proofs for Low-Degree Likelihood Ratio Bounds

Our starting point is the following formula for the Wigner LDLR.

Lemma A.10

(D-LDLR for spiked Wigner model [46]) Let \(L_{n,\lambda ,\mathcal {X}}^{\le D}\) denote the degree-D likelihood ratio for the spiked Wigner model with parameters \(n,\lambda \) and spike prior \(\mathcal {X}\). Then,

$$\begin{aligned} \Vert L_{n,\lambda ,\mathcal {X}}^{\le D}\Vert ^2 = \mathop {\mathbb {E}}_{v^{(1)}, v^{(2)} \sim \mathcal {X}_n}\left[ \sum _{d = 0}^D \frac{1}{d!}\left( \frac{n}{2}\lambda ^2\langle v^{(1)}, v^{(2)} \rangle ^2\right) ^d\right] \end{aligned}$$
(48)

where \(v^{(1)},v^{(2)}\) are drawn independently from \(\mathcal {X}_n\).

In this section, we use the bounds on \(A_d\) (Lemmas 4.3, 4.4 and 4.5) to prove the upper bound (Theorem A.8) and the lower bound (Theorem A.9) on the Wigner LDLR (48).

Proof of Theorem A.8(a)

We only work with those n for which (44) holds. Let \(\mu = -\log \lambda \). Note that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{D_n}{N}&= 0, \\ \liminf _{n\rightarrow \infty }\left( -\frac{1}{2}\log \hat{\lambda }_n\right) = -\frac{1}{2}\log \left( \limsup _{n\rightarrow \infty }\hat{\lambda }_n\right)&> 0. \end{aligned}$$

For large enough n that \(\frac{D_n}{n} < -\frac{1}{2}\log \lambda _n\), applying Lemma 4.3 in the expression of (48) yields

$$\begin{aligned} \begin{aligned} \Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert ^2&= \mathop {\mathbb {E}}_{v^{(1)},v^{(2)}\sim \mathcal {X}_n^\rho }\left[ \sum _{d = 0}^{D_n} \frac{1}{d!}\left( \frac{n}{2}\lambda ^2\right) ^d \langle v^{(1)},v^{(2)} \rangle ^{2d}\right] \\&\lesssim \sum _{d = 1}^{D_n}\frac{1}{d!}\left( \frac{n}{2}\lambda ^2\right) ^d(n\rho )^{-2d} \cdot 2d e^{\mu d+d^2/n}\left( {\begin{array}{c}n\\ d\end{array}}\right) \frac{(2d)!}{2^d}\rho ^{2d}\\&\lesssim \sum _{d = 1}^{D_n}\frac{d(2d)!}{4^d(d!)^2}(e^{\mu +D_n/n} \lambda ^2)^d\\&\lesssim \sum _{d = 1}^{D_n}\frac{1}{\sqrt{d}}(\lambda ^{1/2})^d \\&= O(1), \end{aligned} \end{aligned}$$

where the last equation is by the assumption \(\limsup _{n\rightarrow \infty }\lambda _n < 1\). \(\square \)

Proof of Theorem A.8(b)

By the assumption (45), for sufficiently large n we have \(\lambda _n\le 1/\sqrt{3}\). Now, Theorem A.8(b) immediately follows from Lemma 4.4 (taking \(\mu = \lambda \)), since for large enough n that \(\frac{D_n}{n} < 0.001\), we have

$$\begin{aligned} \begin{aligned} \Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert ^2&\lesssim \sum _{d = 11}^{D_n}\frac{1}{d!}\left( \frac{n}{2}\lambda ^2\right) ^d(n\rho )^{-2d} \cdot \sqrt{d}e^{d^2/n}\left( \frac{11e}{30}\right) ^{d/2}\lambda ^{-2d}\left( {\begin{array}{c}n\\ d\end{array}}\right) \frac{(2d)!}{2^d}\rho ^{2d}\\&\lesssim \sum _{d = 11}^{\infty }\frac{\sqrt{d}(2d)!}{4^d(d!)^2}e^{d^2/n}\left( \frac{11e}{30}\right) ^{d/2}\\&\lesssim \sum _{d = 11}^{\infty }\left( e^{D_n/n}\sqrt{\frac{11e}{30}}\right) ^d \\&\lesssim \sum _{d = 11}^{\infty }\left( e^{0.001}\sqrt{\frac{11e}{30}}\right) ^d \\&=O(1), \end{aligned} \end{aligned}$$

completing the proof. \(\square \)

Proof of Theorem A.9(a)

Substituting (35) and (36) into (48) yields

$$\begin{aligned} \Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert _2^2&\ge \sum _{d = 1}^{D_n} \frac{1}{d!}\left( \frac{n}{2}\lambda ^2\right) ^d\cdot n^{-2d}\left( {\begin{array}{c}n\\ d\end{array}}\right) \frac{(2d)!}{2^d}\\& > rsim \sum _{d = 1}^{D_n} \frac{(2d)!}{4^d(d!)^2}\lambda ^{2d}e^{-dD_n/n}\\ {}& > rsim \sum _{d = 1}^{D_n} \frac{1}{\sqrt{d}}\left( \lambda ^2 e^{-D_n/n}\right) ^d \\&\ge \sum _{d = 1}^{D_n} \frac{1}{\sqrt{d}} \\ {}&= \omega (1), \end{aligned}$$

since \(D_n = \omega (1)\), \(\liminf _{n\rightarrow \infty }\lambda _n > 1\) and \(e^{-D_n/n}\rightarrow 1\). \(\square \)

Lemma A.11

Suppose \(\omega (1) \le D_n \le o(n)\). If there exists a series of positive integers \(w_n = o(\sqrt{D_n})\) such that

$$\begin{aligned} \liminf _{n\rightarrow \infty }\ 2{\lambda }_n^2 \left( \frac{D_n}{ne\rho _n^2}\right) ^{1-\frac{1}{w_n}}\left( \frac{w_n}{(2w_n)!}\right) ^{\frac{1}{w_n}} > 1 \end{aligned}$$
(49)

then \(\Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert _2^2 \rightarrow \infty \) as \(n\rightarrow \infty \).

Proof

If (49) holds, we can choose an \(\epsilon > 0\) such that for sufficiently large n,

$$\begin{aligned} 2\lambda _n^2 \left( \frac{D_n}{ne\rho _n^2}\right) ^{1-\frac{1}{w_n}}\left( \frac{w_n}{(2w_n)!}\right) ^{\frac{1}{w_n}} > 1+\epsilon . \end{aligned}$$

Let n satisfy the above inequality. Pick \(\mu \in (0,1)\) such that

$$\begin{aligned} \mu ^{1-\frac{1}{w_n}}(1+\epsilon ) > 1. \end{aligned}$$

In the sum (48) we only consider those \(d > \mu D_n\) that are multiples of \(w_n\). For each of them, Lemma 4.5 gives

$$\begin{aligned} \begin{aligned}&\frac{1}{d!}\left( \frac{n}{2}\lambda _n^2\right) ^d {\mathbb {E}}\langle v^{(1)},v^{(2)} \rangle ^{2d} \\&\quad > rsim \frac{1}{d!}\left( \frac{n}{2}\lambda _n^2\right) ^d\cdot n^{-2d}\left( {\begin{array}{c}n\\ d\end{array}}\right) \frac{(2d)!}{2^d}\ \left[ 2\left( \frac{d}{ne\rho ^2}\right) ^{1-\frac{1}{w_n}}\left( \frac{w_n}{(2w_n)!}\right) ^{\frac{1}{w}}\right] ^d\\&\quad \ge \frac{(2d)!}{4^d(d!)^2}\cdot \frac{n(n-1)\cdots (n-d+1)}{n^d} \cdot \left[ 2\lambda ^2\left( \frac{\mu D_n}{ne\rho ^2}\right) ^{1-\frac{1}{w_n}}\left( \frac{w_n}{(2w_n)!}\right) ^{\frac{1}{w_n}}\right] ^d\\&\quad > rsim \frac{1}{\sqrt{d}}. \end{aligned} \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} \Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert _2^2& > rsim \sum _{\begin{array}{c} \mu D_n< d < D_n \\ w_n\ |\ d \end{array}}\frac{1}{\sqrt{d}} \\ {}& > rsim \frac{1}{\sqrt{w_n}}\left( \sqrt{\frac{D_n}{w_n}}-\sqrt{\frac{\mu D_n}{w_n}}\right) \\ {}&= (1-\sqrt{\mu })\frac{\sqrt{D_n}}{w_n} \\ {}&= \omega (1), \end{aligned} \end{aligned}$$

completing the proof. \(\square \)

Proof of Theorem A.9(b)

For sufficiently large n, in Lemma A.11 we choose the positive integer

$$\begin{aligned} w_n = \lceil \log (1/\lambda _n) \rceil , \end{aligned}$$

which is \(o(\sqrt{D_n})\). The divergence of \(\Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert _2^2\) follows from the condition (49), which is implied by the following sufficient condition: for sufficiently large n,

$$\begin{aligned} \rho _n < 0.99\frac{1}{\sqrt{e}}\left( \frac{w_n\cdot 2^{w_n}}{(2w_n)!}\right) ^{1/(w-1)} \sqrt{\frac{D_n}{n}}\lambda _n^{w_n/(w_n-1)}. \end{aligned}$$
(50)

Similar to the proof of Lemma 4.7, notice that

$$\begin{aligned}{} & {} \frac{1}{\sqrt{e}}\left( \frac{w_n\cdot 2^{w_n}}{(2w_n)!}\right) ^{\frac{1}{w-1}} \\{} & {} \quad = \Theta (w_n^{-2}) = \Theta (\log ^{-2} (1/\lambda _n));\ \ \ \lambda _n^{w_n/(w_n-1)} = \lambda _n\cdot \lambda _n^{1/(\lceil \log (1/\lambda _n) \rceil -1)} = \Theta (\lambda _n), \end{aligned}$$

Thus there exists an absolute constant C such that, if

$$\begin{aligned} \rho _n < C\sqrt{\frac{D_n}{n}}\lambda _n\log ^{-2}(1/\lambda _n), \end{aligned}$$

then (50) is satisfied and the divergence of \(\Vert L_{n,\lambda ,\mathcal {X}}^{\le D_n}\Vert _2^2\) follows from Lemma A.11. \(\square \)

Chernoff Bounds

In this section, we present two Chernoff-type concentration inequalities used in our proofs.

Lemma B.1

(Local Chernoff bound Gaussian inner products) Let \(u^{(1)},u^{(2)}\in {\mathbb {R}}^N\) be independent samples from \(\mathcal {N}(0,I_N)\). Then, for any \(0 < t \le N/2\),

$$\begin{aligned} \Pr \left[ |\langle u^{(1)},u^{(2)}\rangle | \ge t\right] \le 2\exp \left( -\frac{t^2}{4N}\right) . \end{aligned}$$

Proof

Since by symmetry \(\langle u^{(1)},u^{(2)}\rangle \) and \(-\langle u^{(1)},u^{(2)}\rangle \) have the same distribution, it suffices to bound \(\Pr \left[ \langle u^{(1)},u^{(2)}\rangle \ge t\right] \) for \(0 < t\le N/2\). By Markov’s inequality on the moment generating function, for any \(\mu > 0\),

$$\begin{aligned} \Pr \left[ \langle u^{(1)},u^{(2)}\rangle \ge t\right] \le \frac{\mathbb {E}e^{\mu \langle u^{(1)},u^{(2)}\rangle }}{e^{\mu t}} = e^{-\mu t}(\mathbb {E}e^{\mu x_1x_2})^N, \end{aligned}$$

where \(x_1,x_2\) are independent samples from \(\mathcal {N}(0,1)\). We compute

$$\begin{aligned} \mathbb {E}e^{\mu x_1x_2} = \frac{1}{2\pi }\iint _{\mathbb {R}^2}\exp \left( -\frac{x^2}{2}+\mu xy -\frac{y^2}{2}\right) \textrm{d}x\,\textrm{d}y = (1-\mu ^2)^{-\frac{1}{2}}. \end{aligned}$$

Take \(\mu = t/N \in (0,\frac{1}{2}]\). Note that \(1-z \ge e^{-3z/2}\) on \(z\in (0,\frac{1}{4}]\), and so \(1-\mu ^2 \ge e^{-3\mu ^2/2}\). Hence

$$\begin{aligned} \Pr \left[ \langle u^{(1)},u^{(2)}\rangle \ge t\right] \le \exp \left( -\frac{t^2}{N}\right) \cdot \exp \left( \frac{3N\mu ^2}{4}\right) = \exp \left( -\frac{t^2}{4N}\right) , \end{aligned}$$

and the result follows. \(\square \)

The following result may be found in [47].

Lemma B.2

(Chernoff bound for \(\chi ^2\) distribution) For all \(0< z < 1\),

$$\begin{aligned} \frac{1}{k}\log \Pr \left[ \chi _k^2 \le zk\right] \le \frac{1}{2}(1-z+\log z). \end{aligned}$$

Similarly, for all \(z > 1\),

$$\begin{aligned} \frac{1}{k}\log \Pr \left[ \chi _k^2 \ge zk\right] \le \frac{1}{2}(1-z+\log z). \end{aligned}$$

Corollary B.3

For all \(0 < t \le 1/2\),

$$\begin{aligned} \frac{1}{k}\log \Pr \left[ |\chi _k^2 -k| \ge kt\right] \le -\frac{t^2}{3}. \end{aligned}$$

Proof

It is easy to check that for \(t\in (0,1/2]\),

$$\begin{aligned} t+\log (1-t)&\le -\frac{t^2}{3} \\ -t+\log (1+t)&\le -\frac{t^2}{3}. \end{aligned}$$

Therefore, by Lemma B.2,

$$\begin{aligned} \begin{aligned} \frac{1}{k}\log \Pr \left[ |\chi _k^2 -k| \ge kt\right]&= \frac{1}{k}\log \Pr \left[ \chi _k^2 \ge (1+t)k\right] +\frac{1}{k}\log \Pr \left[ \chi _k^2 \le (1-t)k\right] \\&\le \frac{1}{2}(-t+\log (1+t))+\frac{1}{2}(t+\log (1-t)) \\&\le -\frac{t^2}{3}, \end{aligned} \end{aligned}$$

completing the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, Y., Kunisky, D., Wein, A.S. et al. Subexponential-Time Algorithms for Sparse PCA. Found Comput Math (2023). https://doi.org/10.1007/s10208-023-09603-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10208-023-09603-0

Keywords

Mathematics Subject Classification

Navigation