Paul Janson
Paul Janson
Home
Publications
Contact
Light
Dark
Automatic
foundation models
PyLO: Towards Accessible Learned Optimizers in PyTorch
A PyTorch library that makes learned optimizers accessible to the broader ML community with CUDA-accelerated implementations with substantial speedups (5x improvement for ViT training) and integrates seamlessly with existing PyTorch workflows, enabling practical application of learned optimization to real-world large-scale tasks.
Paul Janson
,
Benjamin Therien
,
Quentin Anthony
,
Xialong Huang
,
Abhinav Moudgil
,
Eugene Belilovsky
PDF
Cite
Code
Post
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
We demonstrate that infinite learning rate schedules consistently outperform widely-used repeated cosine decay for continual pre-training under distribution shifts across both vision and language models, providing a more effective alternative for large-scale self-supervised learning without catastrophic forgetting.
Paul Janson
,
Vaibhav Singh
,
Paria Mehrbod
,
Adam Ibrahim
,
Irina Rish
,
Eugene Belilovsky
,
Benjamin Therien
PDF
Cite
Code
Post
Cite
×