Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2606.05863v1 Announce Type: cross Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the lear

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Related coverage

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Related coverage