报告人:Dmitry Yarotsky (Steklov Institute of Mathematics)
时间:2026-03-26 16:00–17:00
地点:Zoom ID:810 6791 0505 (Password: 765324)
Abstract:
It is well-known, both practically and theoretically, that momentum allows to accelerate gradient descent (GD). In ill-conditioned problems with power-law spectral data, momentum with a suitable schedule allows to double the convergence exponent. However, this approach to acceleration fails for Stochastic GD: for any fixed batch size the optimization diverges.
We show, however, that acceleration of SGD can be achieved by what we call Corner Gradient Descent. The key idea is to extend GD by linear memory and identify different such extensions with different contours in the complex plane. Corner algorithms correspond to contours having a corner with an external angle $\theta\pi$ with some $1 < \theta < 2$. It turns out that such algorithms accelerate the convergence exponent of non-stochastic GD by the factor $\theta$. In the stochastic case the effect is more complex and is described by a phase diagram; in one of its regions the acceleration factor can be made arbitrarily close to 2.
Publication: //openreview.net/forum?id=nOXCfIdhD9
Bio:
Dmitry Yarotsky obtained his PhD in mathematics from Moscow State University in 2002, and later worked at Institute for Information Transmission Problems, Dublin Institute for Advanced Studies, Munich University, Skoltech, and Steklov Institute of Mathematics. Dmitry's interests cover a wide range of topics in applied mathematics, from mathematical physics to data analysis and optimization. His current focus is rigorous results on expressiveness of neural networks and gradient-based optimization.
Join Zoom Meeting
//us02web.zoom.us/j/81067910505?pwd=ReSIxXyA90zOTSA0zYuKzl2GZmacda.1
Meeting ID: 810 6791 0505
Passcode: 765324
