Description
Chair: Kyunghyun Cho
The field of AI is advancing at unprecedented speed in the past few years, due to the rise of large-scale, self-supervised pre-trained models (a.k.a. “foundation models”), such as GPT-3, GPT-4, ChatGPT, Chinchilla, LLaMA, CLIP, DALL-e, StableDiffusion and many others. Impressive few-shot generalization capabilities of such models on a very wide range of novel tasks appear to emerge primarily due to the drastic increase in the size of the models, training data and compute resources. Thus, predicting how the model’s performance and other important characteristics (e.g., robustness, truthfulness, etc) scale with the data, model and compute became a rapidly growing area of research in the past couple of years. Neural scaling laws serve as “investment tools” suggesting optimal allocation of compute resources w.r.t. various aspects of a training process (e.g., model size vs data size ratio), and better compare different architectures and algorithms, predicting which ones will stand the test-of-time as larger compute resources become available. Last, but not least, accurate prediction of emerging behaviors in large-scale AI systems is essential from AI Safety perspective. In this talk, we will present a brief history of neural scaling laws, with the focus on our recent Broken Neural Scaling Laws that generalize previously observed scaling laws to more complex behaviors, metrics, and settings.
While the recent advances in foundation models are truly exciting, they also pose a new challenge to academic and non-profit AI research organizations which historically had no access to the level of compute resources available in industry. This motivated us – a rapidly growing international collaboration across several Universities and non-profit organizations, including U of Montreal/Mila, LAION, EleutherAI, and many others – to join forces and initiate an effort towards developing common objectives and tools for advancing the field of open-source foundation models, in order to avoid accumulation of state-of-the-art AI in a small set of large companies and facilitate democratization of AI. We will overview our recent effort in obtaining large compute resources (e.g., Summit supercomputer) and ongoing large-scale projects we are working on.