DataBubble·

Model Detail

xAI: Grok 4.20 Multi-Agent

—

Provider: xAICategory: multimodal

DB Score

35.2

Downloads

Likes

Day

+0.0%

Week

+0.0%

Month

+0.0%

Overview

xAI: Grok 4.20 Multi-Agent is a multimodal model released by xAI. And supports text+image+file->text inputs.

Performance

xAI: Grok 4.20 Multi-Agent reports a Chatbot Arena ELO of 1,472 across 32,658 votes. Other benchmark slots are still empty in our dataset, so this single figure is best read as a partial picture rather than a full evaluation.

How we score this →

Pricing & Throughput

xAI: Grok 4.20 Multi-Agent is priced at $2/M input tokens and $6/M output tokens. Operationally the model offers a 2000K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. Pricing in this range is the working middle of the API market — neither the cheapest nor the most expensive option per token, so cost-fit is usually a function of how much output you generate.

Technical

The published knowledge cutoff is 2025-09-01, so newer events will not be reflected in zero-shot answers without retrieval.

Use Cases

xAI: Grok 4.20 Multi-Agent is best fit for mixed text-and-image reasoning tasks such as document understanding, and long-context tasks such as full-codebase analysis or book-length summarization (2000K tokens). Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Pricing

Input ($/M tokens)

Output ($/M tokens)

Context Window

2000K

Research Paper

arXiv: 2411.01134→

Arena & Community

Arena ELO

1,472

Arena Votes

32,658

Model Info

Modalitytext+image+file->text

Knowledge Cutoff2025-09-01

Citations1 (0 influential)

Recent newsView all news →

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

arXiv:2606.05863v1 Announce Type: new Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learne

arxiv17h ago

Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View

arXiv:2606.04405v1 Announce Type: cross Abstract: Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-nor

arxivneutral3d ago

Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs

arXiv:2606.00050v1 Announce Type: new Abstract: We present Grokers, an architecture for building persistent, structured comprehension of typed knowledge graphs through bottom-up inductive traversal of dependency subgraphs. Unlike retrieval-augmented generation (RAG), which pays full comprehension co

arxiv3d ago

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

arXiv:2511.01938v3 Announce Type: replace-cross Abstract: Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to represe

arxiv3d ago

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

arXiv:2606.00230v1 Announce Type: new Abstract: Grokking, the phenomenon in which neural networks generalize long after fitting their training data, has been studied in supervised settings on many epochs. LLM pre-training instead involves next-token prediction over an unlabeled corpus, with limited

arxiv4d ago

To Grok Grokking: Provable Grokking in Ridge Regression

arXiv:2601.19791v3 Announce Type: replace Abstract: We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay.

Related Models

Qwen · 22.5M downloads

gemma-4-26B-A4B-it

Google · 11.9M downloads