Model Detail
GLM-4.7-Flash
—GLM-4.7-Flash is a large language model with 15.6B parameters released by zai-org. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, distributed under the permissive mit license.
GLM-4.7-Flash is priced at $0.06/M input tokens and $0.4/M output tokens. Operationally the model offers a 203K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.
GLM-4.7-Flash ships with 15.6B parameters. Total weight footprint is approximately 31.2 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
Downloads of GLM-4.7-Flash have moved -17.5% over the trailing seven days, -51.0% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
GLM-4.7-Flash is best fit for general-purpose chat and instruction-following workloads, high-volume batch jobs where per-call cost dominates the budget, and long-context tasks such as full-codebase analysis or book-length summarization (203K tokens). Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.