Model Detail
grok-2
—How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison
arXiv:2510.26899v5 Announce Type: replace-cross Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produce "truthful" entries using the Grok large language m
Grokking as Dimensional Phase Transition in Neural Networks
arXiv:2604.04655v1 Announce Type: new Abstract: Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \text
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
arXiv:2602.18523v3 Announce Type: replace Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transf
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
arXiv:2602.16746v3 Announce Type: replace Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention
Early-Warning Signals of Grokking via Loss-Landscape Geometry
arXiv:2602.16967v3 Announce Type: replace Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional execution manifolds in modular arithmetic. Whether this mechanism extends beyond arithmetic remains
Grokking From Abstraction to Intelligence
arXiv:2603.29262v1 Announce Type: new Abstract: Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mechanistic origins of model generalization. Despite its significance, existing research remains narrow