Speaker: Fabian Schaipp (TUM)
Title: On the Training of Machine Learning Models: Less Tuning with Adaptive Learning Rates
Abstract: In the first part of the talk, an introduction to the most widely used optimization algorithms for training machine learning models is given (e.g. SGD, SGD with momentum, and Adam).
However, we motivate these algorithms from a slightly different perspective, called model-based stochastic optimization. The central idea is to build approximate models of the loss, and minimize these models with a proximal step at each iteration.
One advantage of the model-based viewpoint is that additional problem structure can easily be incorporated; for example, we can make use of the fact that loss functions are typically non-negative by truncating the model at zero.
Based on these insights, the second part of the talk combines the ideas of momentum and truncation, in order to arrive at a new method which we call MoMo. MoMo can be seen as an adaptive learning rate for SGD with momentum; in fact we can derive a MoMo version of any momentum method, most importantly MoMo-Adam. By construction, the adaptive learning rates of MoMo reduce the amount of learning-rate tuning, which is demonstrated through numerous deep learning experiments.