- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Speaker: Umut Simsekli (ENS/ INRIA)
Bio: Umut Simsekli is a Permanent Research Faculty (equivalent to Associate Professor) at INRIA Paris - SIERRA team and École Normale Supérieure de Paris, Computer Science Department. He received his PhD from Bogaziçi University, Computer Engineering Department in 2015. Before his current post, he was an associate professor at Télécom Paris, Institut Polytechnique de Paris for four years and he spent one year as a visiting faculty member at the University of Oxford, Department of Statistics. His research field is best described as Mathematical Machine Learning, focusing on the algorithmic aspects in statistical learning theory. He is a laureate of the ERC Starting Grant 2021, with the project “DYNASTY: Dynamics-Aware Theory of Deep Learning”. He has been an area chair at NeurIPS, ICML, COLT, and AISTATS, and he served as an elected member at the IEEE Audio and Acoustic Signal Processing technical committee between 2019-2022.
Title: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
Abstract: Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generalization performance. In this talk, we will approach this problem from a dynamical systems theory perspective and represent stochastic optimization algorithms as random iterated function systems (IFS). Well studied in the dynamical systems literature, under mild assumptions, such IFSs can be shown to be ergodic with an invariant measure that is often supported on sets with a fractal structure. As the main contribution, we will prove that the generalization error of a stochastic optimization algorithm can be bounded based on the ‘complexity’ of the fractal structure that underlies its invariant measure. Then, by leveraging results from dynamical systems theory, we will show that the generalization error can be explicitly linked to the choice of the algorithm (e.g., stochastic gradient descent – SGD), algorithm hyperparameters (e.g., step-size, batch-size), and the geometry of the problem (e.g., Hessian of the loss). We will further specialize our results to specific problems (e.g., linear/logistic regression, one hidden-layered neural networks) and algorithms (e.g., SGD and preconditioned variants), and obtain analytical estimates for our bound. For modern neural networks, we will discuss an "efficient" algorithm to compute the developed bound and support our theory with various experiments on neural networks.
Joint work with Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gurbuzbalaban, Lingjiong Zhu