Model Detail
GLM-OCR
—GLM-OCR is a multimodal model with 663M parameters released by zai-org. The model is registered under the image-text-to-text pipeline tag on Hugging Face, distributed under the permissive mit license.
GLM-OCR ships with 663M parameters. Total weight footprint is approximately 1.3 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
GLM-OCR is best fit for mixed text-and-image reasoning tasks such as document understanding. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.