Paul Janson
Paul Janson
Home
Publications
Contact
Light
Dark
Automatic
large language models
Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
MAPL treats inter-stage activation compression in pipeline parallelism as a learnable orthogonal projection under Stiefel manifold constraints, letting each stage adapt its own task-optimal subspace. Combined with factorized anchor embeddings and residual vector quantization, it achieves high compression with negligible performance loss across LLaMA models from 150M to 1B parameters.
Paul Janson
,
Edouard Oyallon
,
Eugene Belilovsky
PDF
Cite
Stabilizing Native Low-Rank LLM Pretraining
We identify the uncontrolled growth of weight-update spectral norms as the key instability in training natively low-rank LLMs, and introduce Spectron, a lightweight spectral renormalization technique that enables stable end-to-end factorized pretraining with compute-optimal scaling laws and improved inference efficiency.
Paul Janson
,
Edouard Oyallon
,
Eugene Belilovsky
PDF
Cite
Code
Post
Cite
×