Paul Janson
Paul Janson
Home
Publications
Contact
Light
Dark
Automatic
masked auto encoder
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
We demonstrate that infinite learning rate schedules consistently outperform widely-used repeated cosine decay for continual pre-training under distribution shifts across both vision and language models, providing a more effective alternative for large-scale self-supervised learning without catastrophic forgetting.
Paul Janson
,
Vaibhav Singh
,
Paria Mehrbod
,
Adam Ibrahim
,
Irina Rish
,
Eugene Belilovsky
,
Benjamin Therien
PDF
Cite
Code
Post
Cite
×