Muown: Row-Norm Control for Muon Optimization

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2605.10797v1 Announce Type: new Abstract: Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drift

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Muown: Row-Norm Control for Muon Optimization

Related coverage

Muown: Row-Norm Control for Muon Optimization

Related coverage