Stabilizing Native Low-Rank LLM Pretraining

Paul Janson, Edouard Oyallon, Eugene Belilovsky

May, 2026

Abstract

Training large language models using exclusively low-rank factorized weights promises substantial savings in memory and inference cost, but remains plagued by instability. We identify uncontrolled growth in the spectral norm of weight matrix updates as the primary source of this instability when training natively factorized models end-to-end. To address it, we introduce Spectron, a spectral renormalization technique with orthogonalization that dynamically constrains weight updates based on the current spectral norms of the factors. Spectron enables stable end-to-end factorized training with minimal computational overhead. We further establish compute-optimal scaling laws for natively low-rank models, demonstrating predictable power-law behavior and improved inference efficiency compared to dense models.

Type

Conference paper

Publication

In International Conference on Machine Learning (ICML) 2026

Stabilizing Native Low-Rank LLM Pretraining

Abstract

Paul Janson

PhD Student