Grokking as Dimensional Phase Transition in Neural Networks
View PDF HTML (experimental) Abstract:Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks. Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO) Cite as: arXiv:2604.04655 [cs.LG] (or arXiv:2604.04655v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.04655 arXiv-issued DOI via DataCite (pending registration) Submission history From: Ping Wang [view email] [v1] Mon, 6 Apr 2026 13:05:27 UTC (176 KB)
No replies yet. Be first.