On the Momentum Term in Gradient Descent Learning Algorithms
Ning Qian, Neural Networks, 1999, 12:145-151.
Download the full paper
(compressed PostScript file, 0.18MB)
(PDF file, 0.32MB)
Abstract
A momentum term is usually included in the simulations of
connectionist learning algorithms. Although it is well known that
such a term greatly improves the speed of learning, there have been
few rigorous studies of its mechanisms. In this paper, I show that
in the limit of continuous time, the momentum parameter is analogous
to the mass of Newtonian particles that move through a viscous medium
in a conservative force field. The behavior of the system near a
local minimum is equivalent to a set of coupled and damped harmonic
oscillators. The momentum term improves the speed of convergence by
bringing some eigen components of the system closer to critical
damping. Similar results can be obtained for the discrete time case
used in computer simulations. In particular, I derive the bounds for
convergence on learning-rate and momentum parameters, and
demonstrate that the momentum term can increase the range of learning rate
over which the system converges. The optimal condition for convergence
is also analyzed.
Back to Qian Lab Home Page