Model Detail
Qwen: Qwen-Turbo
—Qwen: Qwen-Turbo is a large language model released by Qwen. And supports text->text inputs.
Qwen: Qwen-Turbo is priced at $0.05/M input tokens and $0.2/M output tokens. Operationally the model offers a 129K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.
The published knowledge cutoff is 2025-03-31, so newer events will not be reflected in zero-shot answers without retrieval.
Qwen: Qwen-Turbo is best fit for general-purpose chat and instruction-following workloads, and high-volume batch jobs where per-call cost dominates the budget. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Qwen-Image-Flash: Beyond Objective Design
arXiv:2606.03746v2 Announce Type: replace-cross Abstract: Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary pers
LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification
arXiv:2606.00647v1 Announce Type: cross Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics achieves a macro F1
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
arXiv:2605.30280v2 Announce Type: replace-cross Abstract: Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In t
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
arXiv:2605.08738v2 Announce Type: replace-cross Abstract: Structured pruning and knowledge distillation (KD) are typical techniques for compressing large language models, but it remains unclear how they should be applied at pretraining scale, especially to recent mixture-of-experts (MoE) models. In