arxiv
PublishedMay 12, 2026 at 4:00 AM
—neutral
Muown: Row-Norm Control for Muon Optimization
Publisher summary· verbatim
arXiv:2605.10797v1 Announce Type: new Abstract: Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drift
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivMODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning6harxivPosition: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!6harxivGeneralizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions6harxivThe Impossibility of Eliciting Latent Knowledge6hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗