arxivApril 6, 2026 at 4:00 AM1 min read
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
arXiv:2602.18523v3 Announce Type: replace Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transf
No replies yet. Be first.