CCM Colloquium: Edouard Oyallon (CCM)

America/New_York
3rd Floor Classroom/3-Flatiron Institute (162 5th Avenue)

3rd Floor Classroom/3-Flatiron Institute

162 5th Avenue

40
Description

Title: Reducing communications in decentralized learning via randomization and asynchrony

Decentralized learning is a paradigm where distant nodes collaboratively train a machine learning model without a central node orchestrating computations and communications. While typically applied in internet-connected collaborations of computers, this paradigm also extends to cluster computing. Scaling decentralized training to a large amount of computing nodes requires careful communication management. My approach utilizes randomization and asynchrony to minimize communication overhead. I'll provide a brief introduction to the field and describe Acid, a principled algorithm that significantly reduces communication costs for training Deep Neural Networks on clusters. Notably, Acid achieves remarkable communication cost reductions on ImageNet with 64 asynchronous workers (A100 GPUs), nearly at no additional expense.

https://hal.science/hal-04124318/  Nabli A., Belilovsky E. and Oyallon E. - A2CiD2: Accelerating Asynchronous Communication in Decentralized Deep Learning, NeurIPS 2023.

The agenda of this meeting is empty