Paul Janson
Paul Janson
Home
Publications
Contact
Light
Dark
Automatic
1
Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
MAPL treats inter-stage activation compression in pipeline parallelism as a learnable orthogonal projection under Stiefel manifold constraints, letting each stage adapt its own task-optimal subspace. Combined with factorized anchor embeddings and residual vector quantization, it achieves high compression with negligible performance loss across LLaMA models from 150M to 1B parameters.
Paul Janson
,
Edouard Oyallon
,
Eugene Belilovsky
PDF
Cite
PyLO: Towards Accessible Learned Optimizers in PyTorch
A PyTorch library that makes learned optimizers accessible to the broader ML community with CUDA-accelerated implementations with substantial speedups (5x improvement for ViT training) and integrates seamlessly with existing PyTorch workflows, enabling practical application of learned optimization to real-world large-scale tasks.
Paul Janson
,
Benjamin Therien
,
Quentin Anthony
,
Xialong Huang
,
Abhinav Moudgil
,
Eugene Belilovsky
PDF
Cite
Code
Post
Stabilizing Native Low-Rank LLM Pretraining
We identify the uncontrolled growth of weight-update spectral norms as the key instability in training natively low-rank LLMs, and introduce Spectron, a lightweight spectral renormalization technique that enables stable end-to-end factorized pretraining with compute-optimal scaling laws and improved inference efficiency.
Paul Janson
,
Edouard Oyallon
,
Eugene Belilovsky
PDF
Cite
Code
Post
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
We demonstrate that infinite learning rate schedules consistently outperform widely-used repeated cosine decay for continual pre-training under distribution shifts across both vision and language models, providing a more effective alternative for large-scale self-supervised learning without catastrophic forgetting.
Paul Janson
,
Vaibhav Singh
,
Paria Mehrbod
,
Adam Ibrahim
,
Irina Rish
,
Eugene Belilovsky
,
Benjamin Therien
PDF
Cite
Code
Post
Towards motion from video diffusion models
This study investigates the capabilities of video diffusion models in generating human motion from text prompts, revealing their strengths in common motions and limitations in rare or complex movements.
Paul Janson
,
Tiberiu Popa
,
Eugene Belilovsky
PDF
Cite
Continual zero-shot learning through semantically guided generative random walk
Learning novel concepts, remembering previous knowledge, and adapting it to future tasks occur simultaneously throughout a human’s …
Wenxuan Zhang
,
Paul Janson
,
Divyansh Jha
,
Kai Yi
,
Ivan Skorodov
,
Mohammed Elhoseiny
PDF
Cite
Code
Overcoming Generic Knowledge Loss with Selective Parameter Update
Adding knowledge to the model without destroying its generalization by finetuning small set of parameters
Wenxuan Zhang
,
Paul Janson
,
Rahaf Aljundi
,
Mohammed Elhoseiny
PDF
Cite
Post
Domain Aware Zero shot learning
Continual zero-shot learning involves learning seen classes incrementally while improving the ability to recognize unseen or …
Kai Yi
,
Paul Janson
,
Wenxuan Zhang
,
Mohammed Elhoseiny
PDF
Cite
A Simple baseline that questions the use of pre-trained model in continual learning
A baseline that performs better without training in continual learning benchmarks
Paul Janson
,
Wenxuan Zhang
,
Rahaf Aljundi
,
Mohammed Elhoseiny
PDF
Cite
Code
Cite
×