Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2509.11983v2 Announce Type: replace Abstract: Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \citep{jordanmuon}, which explicitly exploits this structure, has

Discussion

No replies yet. Be first.

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

Related coverage