- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Abstract: Empirical risk minimization (ERM) is the dominant paradigm in statistical learning. Optimizing the empirical risk of neural networks is a highly non-convex optimization problem but, despite this, it is routinely solver to optimality or near optimality using first order methods such as stochastic gradient descent. It has recently been argued that overparametrization plays a key role in explaining this puzzle: overparametrized models are simple to optimize, achieving vanishing or nearly vanishing training error. Surprisingly, the overparametrized models learnt by gradient-based methods appear to have good generalization properties. I will present recent results on these phenomena booth in linear models that are directly motivated by the analysis of two-layer neural networks, and in some simple nonlinear models.
(Based on joint work with Yiqiao Zhong and Kangjie Zhou)