efficiency

Stabilizing Native Low-Rank LLM Pretraining

We identify the uncontrolled growth of weight-update spectral norms as the key instability in training natively low-rank LLMs, and introduce Spectron, a lightweight spectral renormalization technique that enables stable end-to-end factorized pretraining with compute-optimal scaling laws and improved inference efficiency.

Paul Janson, Edouard Oyallon, Eugene Belilovsky

Stabilizing Native Low-Rank LLM Pretraining