The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2511.01938v3 Announce Type: replace-cross Abstract: Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to represe

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Related coverage

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Related coverage