DataBubble·

Model Detail

gemma-3-4b-it

—

Provider: GoogleCategory: multimodalPipeline: image-text-to-textParameters: 4B

DB Score

1.0

Downloads

2.3M

Likes

Day

+0.0%

Week

+0.0%

Month

+3.8%

Overview

gemma-3-4b-it is a multimodal model with 4B parameters released by Google. The model is registered under the image-text-to-text pipeline tag on Hugging Face, and supports text+image->text inputs, released under the gemma license.

Pricing & Throughput

gemma-3-4b-it is priced at $0.04/M input tokens and $0.08/M output tokens. Operationally the model offers a 131K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.

Technical

gemma-3-4b-it ships with 4B parameters. The published knowledge cutoff is 2024-08-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 4.3 GB, which is the relevant figure when planning local-inference VRAM. Access is gated on Hugging Face under the gemma license, which means a manual approval step before weights can be downloaded.

Trending Signal

Downloads of gemma-3-4b-it have moved +3.8% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

gemma-3-4b-it is best fit for mixed text-and-image reasoning tasks such as document understanding, and high-volume batch jobs where per-call cost dominates the budget. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Pricing

Input ($/M tokens)

$0.04

Output ($/M tokens)

$0.08

Context Window

131K

Research Paper

arXiv: 2403.08295→

Model Info

Licensegemma

Modalitytext+image->text

Knowledge Cutoff2024-08-31

Citations1,149 (137 influential)

Recent newsView all news →

Trivial Prompt Reframing Bypasses Safety Guardrails in Google\'s MedGemma-4B

arXiv:2607.09804v1 Announce Type: cross Abstract: Open-weight medical language models are increasingly used as the base of patient-facing and clinician-support applications. Their model cards prohibit specific behaviors -- recommending exact drug dosages, issuing definitive diagnoses, prescribing tr

huggingface20d ago

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

arxiv21d ago

DistilledGemma: Balanced Efficiency-Accuracy for Person-Place Relation Extraction from Multilingual Historical Articles

arXiv:2606.29130v1 Announce Type: new Abstract: We present DistilledGemma, an efficient and accurate system for the HIPE-2026 shared task on person-place relation extraction from multilingual historical newspaper articles in English, German, and French. Our approach adopts a three-stage knowledge di

arxivneutral31d ago

How Transparent is DiffusionGemma?

arXiv:2606.20560v1 Announce Type: cross Abstract: LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous

arxiv36d ago

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens

arXiv:2606.14620v1 Announce Type: new Abstract: Open diffusion language models are marketed as parallel, non-autoregressive decoders, yet the order in which a shipped checkpoint actually commits its tokens is almost never measured. We instrument DiffusionGemma 26B, a masked discrete-diffusion mixtur

arxiv48d ago

Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines

arXiv:2605.25645v2 Announce Type: replace-cross Abstract: We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU and GPU platforms for large language model adaptation. Using LoRA on a Google TPU v5

Related Models

gemma-4-26B-A4B-it

Google · 13.6M downloads

gemma-4-31B-it

Google · 12.3M downloads

Qwen3-VL-2B-Instruct

Qwen · 22.5M downloads

gemma-4-26B-A4B-it

Google · 13.6M downloads