DataBubble·

Model Detail

UI-TARS-1.5-7B

—

Provider: ByteDance-SeedCategory: multimodalPipeline: image-text-to-textParameters: 7B

DB Score

0.0

Downloads

31K

Likes

547

Day

+0.0%

Week

+0.0%

Month

+0.0%

Overview

UI-TARS-1.5-7B is a multimodal model with 7B parameters released by ByteDance-Seed. The model is registered under the image-text-to-text pipeline tag on Hugging Face, and supports text+image->text inputs, distributed under the permissive apache-2.0 license.

Pricing & Throughput

UI-TARS-1.5-7B is priced at $0.1/M input tokens and $0.2/M output tokens. Operationally the model offers a 128K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.

Technical

UI-TARS-1.5-7B ships with 7B parameters. The published knowledge cutoff is 2025-01-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 8.3 GB, which is the relevant figure when planning local-inference VRAM. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Use Cases

UI-TARS-1.5-7B is best fit for mixed text-and-image reasoning tasks such as document understanding, and high-volume batch jobs where per-call cost dominates the budget. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Pricing

Input ($/M tokens)

$0.1

Output ($/M tokens)

$0.2

Context Window

128K

Research Paper

arXiv: 2501.12326→

Model Info

Licenseapache-2.0

Modalitytext+image->text