Model Detail
xAI: Grok 4.20 Multi-Agent
—xAI: Grok 4.20 Multi-Agent is a multimodal model released by xAI. And supports text+image+file->text inputs.
xAI: Grok 4.20 Multi-Agent reports a Chatbot Arena ELO of 1,472 across 32,658 votes. Other benchmark slots are still empty in our dataset, so this single figure is best read as a partial picture rather than a full evaluation.
xAI: Grok 4.20 Multi-Agent is priced at $2/M input tokens and $6/M output tokens. Operationally the model offers a 2000K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. Pricing in this range is the working middle of the API market — neither the cheapest nor the most expensive option per token, so cost-fit is usually a function of how much output you generate.
The published knowledge cutoff is 2025-09-01, so newer events will not be reflected in zero-shot answers without retrieval.
xAI: Grok 4.20 Multi-Agent is best fit for mixed text-and-image reasoning tasks such as document understanding, and long-context tasks such as full-codebase analysis or book-length summarization (2000K tokens). Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction
arXiv:2606.05863v1 Announce Type: new Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learne
Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View
arXiv:2606.04405v1 Announce Type: cross Abstract: Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-nor
Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs
arXiv:2606.00050v1 Announce Type: new Abstract: We present Grokers, an architecture for building persistent, structured comprehension of typed knowledge graphs through bottom-up inductive traversal of dependency subgraphs. Unlike retrieval-augmented generation (RAG), which pays full comprehension co
The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
arXiv:2511.01938v3 Announce Type: replace-cross Abstract: Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to represe
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
arXiv:2606.00230v1 Announce Type: new Abstract: Grokking, the phenomenon in which neural networks generalize long after fitting their training data, has been studied in supervised settings on many epochs. LLM pre-training instead involves next-token prediction over an unlabeled corpus, with limited
To Grok Grokking: Provable Grokking in Ridge Regression
arXiv:2601.19791v3 Announce Type: replace Abstract: We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay.