Dear CCN Colleagues,
Please welcome guest speaker and candidate for a Flatiron Research Fellow position here at Flatiron, Francesca Mignacco, Institute of Theoretical Physics, Paris-Saclay University, for our CCN Seminar. She will be presenting, "Statistical physics insights on stochastic gradient descent." Abstract below!
Abstract: Artificial neural networks (ANNs) trained with gradient-based algorithms have achieved impressive performances in a variety of applications. In particular, the stochastic gradient-descent (SGD) algorithm has proved to be surprisingly efficient in navigating high-dimensional loss landscapes. However, the theory behind this practical success remains largely unexplained. A general consensus has arisen that the answer requires a detailed description of the trajectory traversed during training. This task is highly nontrivial for at least two reasons. First, the high dimension of the parameter space where ANNs operate defies standard mathematical techniques. Second, SGD navigates a non-convex loss landscape following an out-of-equilibrium dynamics with a complicated state-dependent noise. In this talk, I will consider prototypical learning problems that are amenable to an exact characterisation. I will show how methods from spin glass theory can be used to derive a low-dimensional description of the network performance and the learning dynamics of gradient-based algorithms, including multi-pass SGD. I will discuss how different sources of algorithmic noise affect the performance of the network in a benchmark high-dimensional non-convex task (sign retrieval) and how to characterise SGD noise via an effective fluctuation-dissipation relation holding at stationarity.
Due to limited space, please feel welcome to tune in remotely. Zoom credentials in the calendar invite.