Model Detail
xAI: Grok 3
—Grokking of Diffusion Models: Case Study on Modular Addition
arXiv:2604.17673v1 Announce Type: new Abstract: Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhibit grokking--delayed generalization after overfit
Dimensional Criticality at Grokking Across MLPs and Transformers
arXiv:2604.16431v1 Announce Type: new Abstract: Abrupt transitions between distinct dynamical regimes are a hallmark of complex systems. Grokking in deep neural networks provides a striking example -- an abrupt transition from memorization to generalization long after training accuracy saturates --
Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking
arXiv:2604.13123v1 Announce Type: cross Abstract: Grokking -- delayed generalisation long after memorisation -- lacks a predictive mechanistic explanation. We identify the normalised spectral entropy $\tilde{H}(t)$ of the representation covariance as a scalar order parameter for this transition, val

Grok’s sexual deepfakes almost got it banned from Apple’s App Store. Almost.
Apple quietly threatened to kick Elon Musk's AI app, Grok, from its App Store in January over its failure to curb the surge of nonconsensual sexual deepfakes flooding X, according to NBC News. It was a muted show of force from one of tech's most powerful gatekeepers, made behind closed doors even as
How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison
arXiv:2510.26899v5 Announce Type: replace-cross Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produce "truthful" entries using the Grok large language m
Grokking as Dimensional Phase Transition in Neural Networks
arXiv:2604.04655v1 Announce Type: new Abstract: Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \text