当前位置: X-MOL 学术SIAM J. Numer. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Numerical Analysis for Convergence of a Sample-Wise Backpropagation Method for Training Stochastic Neural Networks
SIAM Journal on Numerical Analysis ( IF 2.9 ) Pub Date : 2024-03-01 , DOI: 10.1137/22m1523765
Richard Archibald 1 , Feng Bao 2 , Yanzhao Cao 3 , Hui Sun 2
Affiliation  

SIAM Journal on Numerical Analysis, Volume 62, Issue 2, Page 593-621, April 2024.
Abstract. The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807–2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.


中文翻译:

用于训练随机神经网络的样本反向传播方法收敛性的数值分析

SIAM 数值分析杂志,第 62 卷,第 2 期,第 593-621 页,2024 年 4 月。
摘要。本文的目的是对一种新颖的样本反向传播方法进行收敛分析和算法实现,用于训练一类随机神经网络(SNN)。对这种 SNN 框架的初步讨论首先在 [Archibald et al., Discrete Contin. 动态。系统。序列。S,15(2022),第 2807–2835 页]。SNN 的结构被表述为随机微分方程 (SDE) 的离散化。引入随机最优控制框架对训练过程进行建模,并应用伴随后向 SDE 的样本近似方案来提高随机最优控制求解器的效率,这相当于训练 SNN 的反向传播。通过为梯度过程引入一种新颖的联合条件期望来导出收敛分析。在凸性假设下,我们的结果表明,在凸优化情况下,SNN 训练步骤的数量应与层数的平方成正比。在使用基准 MNIST 数据集实现基于样本的 SNN 算法时,我们采用了卷积神经网络 (CNN) 架构,并证明我们的基于样本的 SNN 算法比传统的 CNN 更鲁棒。
更新日期:2024-03-01
down
wechat
bug