arxiv
PublishedJune 5, 2026 at 4:00 AM
—neutral
Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction
Publisher summary· verbatim
arXiv:2606.05863v1 Announce Type: new Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learne
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning17harxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning17harxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models17harxivTemporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents17hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗