Name: Math. of Deep Learning Seminar: Mikhail (Misha) Belkin
Start: 2021-11-09T11:00:00-05:00
End: 2021-11-09T12:00:00-05:00
Location: No location set

Description

Title: The Polyak-Lojasiewicz condition as a framework for over-parameterized optimization and its application to deep learning

Abstract: The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized nonlinear systems, a setting that
includes deep neural networks. I will discuss that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD. I will connect the PL condition of these systems to the condition number associated with the tangent kernel and show how a non-linear theory for those systems parallels classical analyses of over-parameterized linear equations.
As a separate related development, I will discuss a perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity results from the scaling of the Hessian with the size of the network controlled
by certain functional norms. Combining these ideas, I will show how the transition to linearity can be used to demonstrate the PL condition and convergence for a general class of wide neural networks. Finally I will comment systems which are ''almost'' over-parameterized, which appears to be common in practice.

Based on joint work with Chaoyue Liu and Libin Zhu

If you would like to attend, please email crampersad@flatironinstitute.org for the Zoom link.