DataBubble·

Model Detail

GLM-OCR

▼ 1.5%

Provider: zai-orgCategory: multimodalPipeline: image-text-to-text

DB Score

4.1

Downloads

3.4M

Likes

Day

-1.5%

Week

+13.2%

Month

+0.0%

Overview

GLM-OCR is a multimodal model with 663M parameters released by zai-org. The model is registered under the image-text-to-text pipeline tag on Hugging Face, distributed under the permissive mit license.

Technical

GLM-OCR ships with 663M parameters. Total weight footprint is approximately 1.3 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Trending Signal

Downloads of GLM-OCR have moved -1.5% over the past 24 hours, +13.2% over the trailing seven days. The trend is mildly positive, consistent with a model that is being picked up incrementally rather than going viral. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

GLM-OCR is best fit for mixed text-and-image reasoning tasks such as document understanding. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2603.10910→

Model Info

Licensemit

Citations1,623 (173 influential)

Recent newsView all news →

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

arXiv:2604.02947v1 Announce Type: new Abstract: Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. T

Related Models

GLM-5.2-FP8

zai-org · 2.9M downloads

GLM-4.7-Flash

zai-org · 2.6M downloads