DataBubble·

Model Detail

Qwen2.5-VL-7B-Instruct

—

Provider: QwenCategory: multimodalPipeline: image-text-to-textParameters: 7B

DB Score

1.2

Downloads

9.5M

Likes

Day

+0.0%

Week

+0.0%

Month

+42.4%

Overview

Qwen2.5-VL-7B-Instruct is a multimodal model with 7B parameters released by Qwen. The model is registered under the image-text-to-text pipeline tag on Hugging Face, distributed under the permissive apache-2.0 license.

Technical

Qwen2.5-VL-7B-Instruct ships with 7B parameters. Total weight footprint is approximately 8.3 GB, which is the relevant figure when planning local-inference VRAM. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Trending Signal

Downloads of Qwen2.5-VL-7B-Instruct have moved +42.4% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

Qwen2.5-VL-7B-Instruct is best fit for mixed text-and-image reasoning tasks such as document understanding. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2309.00071→

Model Info

Licenseapache-2.0

Citations2,239 (268 influential)

Recent newsView all news →

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

arXiv:2606.12392v1 Announce Type: cross Abstract: Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understandi

arxivneutral98d ago

Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

arXiv:2604.09571v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have sparked growing interest in using them to automate web tasks, yet their feasibility as independent agents that reason and act purely from visual input remains underexplored. We investigate this se

Related Models

Qwen3-0.6B

Qwen · 25.3M downloads