Paul Janson
Paul Janson
Home
Publications
Contact
Light
Dark
Automatic
efficiency
Stabilizing Native Low-Rank LLM Pretraining
We identify the uncontrolled growth of weight-update spectral norms as the key instability in training natively low-rank LLMs, and introduce Spectron, a lightweight spectral renormalization technique that enables stable end-to-end factorized pretraining with compute-optimal scaling laws and improved inference efficiency.
Paul Janson
,
Edouard Oyallon
,
Eugene Belilovsky
PDF
Cite
Code
Post
Cite
×