当前位置: X-MOL 学术Found. Comput. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Subexponential-Time Algorithms for Sparse PCA
Foundations of Computational Mathematics ( IF 3 ) Pub Date : 2023-01-19 , DOI: 10.1007/s10208-023-09603-0
Yunzi Ding , Dmitriy Kunisky , Alexander S. Wein , Afonso S. Bandeira

We study the computational cost of recovering a unit-norm sparse principal component \(x \in \mathbb {R}^n\) planted in a random matrix, in either the Wigner or Wishart spiked model (observing either \(W + \lambda xx^\top \) with W drawn from the Gaussian orthogonal ensemble, or N independent samples from \(\mathcal {N}(0, I_n + \beta xx^\top )\), respectively). Prior work has shown that when the signal-to-noise ratio (\(\lambda \) or \(\beta \sqrt{N/n}\), respectively) is a small constant and the fraction of nonzero entries in the planted vector is \(\Vert x\Vert _0 / n = \rho \), it is possible to recover x in polynomial time if \(\rho \lesssim 1/\sqrt{n}\). While it is possible to recover x in exponential time under the weaker condition \(\rho \ll 1\), it is believed that polynomial-time recovery is impossible unless \(\rho \lesssim 1/\sqrt{n}\). We investigate the precise amount of time required for recovery in the “possible but hard” regime \(1/\sqrt{n} \ll \rho \ll 1\) by exploring the power of subexponential-time algorithms, i.e., algorithms running in time \(\exp (n^\delta )\) for some constant \(\delta \in (0,1)\). For any \(1/\sqrt{n} \ll \rho \ll 1\), we give a recovery algorithm with runtime roughly \(\exp (\rho ^2 n)\), demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the \(\exp (\rho n)\)-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.



中文翻译:

稀疏 PCA 的次指数时间算法

我们研究了在 Wigner 或 Wishart 尖峰模型中恢复植入随机矩阵中的单位范数稀疏主成分\(x \in \mathbb {R}^n\)的计算成本(观察\(W + \ lambda xx^\top \) ,其中W取自高斯正交系综,或分别取自\(\mathcal {N}(0, I_n + \beta xx^\top )\) 的N 个独立样本。先前的工作表明,当信噪比(分别为\(\lambda \)\(\beta \sqrt{N/n}\) )是一个小常数时,植入的非零条目的分数向量为\(\Vert x\Vert _0 / n = \rho \) ,如果\(\rho \lesssim 1/\sqrt{n}\) ,则可以在多项式时间内恢复x。虽然在较弱条件下可以在指数时间内恢复x (\rho \ll 1\),但相信多项式时间恢复是不可能的,除非\(\rho \lesssim 1/\sqrt{n}\)。我们通过探索次指数时间算法的力量(即运行的算法)来研究“可能但困难”状态\(1/\sqrt{n} \ll \rho \ll 1\)中恢复所需的精确时间量在时间\(\exp (n^\delta )\)中对于某个常数\(\delta \in (0,1)\)。对于任何\(1/\sqrt{n} \ll \rho \ll 1\) ,我们给出一个运行时间大致为\(\exp (\rho ^2 n)\) 的恢复算法,展示了稀疏性和运行。我们的算法系列可以在两种现有算法之间平滑插值:多项式时间对角阈值算法和\(\exp (\rho n)\)时间穷举搜索算法。此外,通过分析低度似然比,我们提供了严格的证据表明我们的算法实现的权衡是最优的。

更新日期:2023-01-19
down
wechat
bug