Qwen-Image-Edit-2509 news

26 articles mentioning Qwen-Image-Edit-2509

arxiv19h ago

Qwen-Image-Flash: Beyond Objective Design

arXiv:2606.03746v2 Announce Type: replace-cross Abstract: Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary pers

arxiv3d ago

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

arXiv:2606.00647v1 Announce Type: cross Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics achieves a macro F1

arxiv3d ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

arXiv:2605.30280v2 Announce Type: replace-cross Abstract: Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In t

arxivMay 19

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

arXiv:2605.08738v2 Announce Type: replace-cross Abstract: Structured pruning and knowledge distillation (KD) are typical techniques for compressing large language models, but it remains unclear how they should be applied at pretraining scale, especially to recent mixture-of-experts (MoE) models. In

arxivMay 15

Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models

arXiv:2605.11907v2 Announce Type: replace Abstract: We measure procedural-skill SFT contribution across three Qwen3.5 dense scales (0.8B, 2B, 4B) on a 200-task / 40-skill holdout, with Claude Haiku 4.5 as a frontier reference. The corpus is 353 rows of (task + procedural-skill block, Opus chain-of-t

arxivMay 14

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

arXiv:2605.11887v1 Announce Type: new Abstract: Large language models have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque, limiting our ability to inspect, control, and systematically improve them. This opacity motivates a gr

arxivMay 13

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

arXiv:2605.10296v1 Announce Type: cross Abstract: We participated in the Fifth UNLP shared task on multi-domain document understanding, where systems must answer Ukrainian multiple-choice questions from PDF collections and localize the supporting document and page. We propose a retrieval-augmented p

arxivMay 11

MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning

arXiv:2605.07269v1 Announce Type: new Abstract: Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla t

arxivMay 11

Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding

arXiv:2605.07141v1 Announce Type: cross Abstract: Open-world referring segmentation requires grounding unconstrained language expressions to precise pixel-level regions. Existing multimodal large language models (MLLMs) exhibit strong open-world visual grounding, but their outputs remain limited to

huggingfaceMay 8

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

arxivMay 7

Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification

arXiv:2511.05080v4 Announce Type: replace Abstract: The growing public demand for accessible biomedical information calls for scalable text simplification. While large language models (LLMs) offer solutions, they too struggle with balancing improved readability against preservation of meaning. This

arxivApr 24

AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

arXiv:2604.21590v1 Announce Type: new Abstract: Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agen

arxivApr 22

Qwen3.5-Omni Technical Report

arXiv:2604.15804v2 Announce Type: replace Abstract: In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By

arxivApr 21

Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models

arXiv:2604.18429v1 Announce Type: cross Abstract: Change visual question answering (Change VQA) addresses the problem of answering natural-language questions about semantic changes between bi-temporal remote sensing (RS) images. Although vision-language models (VLMs) have recently been studied for t

arxivApr 17

Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali

arXiv:2604.14171v1 Announce Type: new Abstract: Romanized Nepali, the Nepali language written in the Latin alphabet, is the dominant medium for informal digital communication in Nepal, yet it remains critically underresourced in the landscape of Large Language Models (LLMs). This study presents a sy

arxivApr 17

QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment

arXiv:2604.14175v1 Announce Type: new Abstract: We present a unified system addressing both Subtask 3 (answer generation) and Subtask 4 (evidence sentence alignment) of the ArchEHR-QA Shared Task. For Subtask 3, we apply two-stage Quantised Low-Rank Adaptation (QLoRA) to Qwen3-4B loaded in 4-bit NF4

#research #natural language processing #question answering

arxivApr 14

Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

arXiv:2604.09571v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have sparked growing interest in using them to automate web tasks, yet their feasibility as independent agents that reason and act purely from visual input remains underexplored. We investigate this se

arxivApr 10

Robustness Risk of Conversational Retrieval: Identifying and Mitigating Noise Sensitivity in Qwen3-Embedding Model

arXiv:2604.06176v1 Announce Type: cross Abstract: We present an empirical study of embedding-based retrieval under realistic conversational settings, where queries are short, dialogue-like, and weakly specified, and retrieval corpora contain structured conversational artifacts. Focusing on Qwen3-emb

arxivApr 9

Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models

arXiv:2604.07035v1 Announce Type: new Abstract: Mixture-of-experts (MoE) language models are often expected to offer better quality-efficiency tradeoffs than dense models because only a subset of parameters is activated per token, but the practical value of that advantage depends on end-to-end behav

arxivMar 31

KazByte: Adapting Qwen models to Kazakh via Byte-level Adapter

arXiv:2603.27859v1 Announce Type: new Abstract: Large language models fragment Kazakh text into many more tokens than equivalent English text, because their tokenizers were built for high-resource languages. This tokenizer tax inflates compute, shortens the effective context window, and weakens the

arxivMar 30

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

arXiv:2603.25841v1 Announce Type: cross Abstract: Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We introduce GazeQwen, a parameter efficient approach

huggingfaceSep 29

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

huggingfaceApr 30

The 4 Things Qwen-3’s Chat Template Teaches Us

arxivMay 16bullish

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor

#transformers #attention #efficiency

arxivMay 14bullish

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

arXiv:2604.10720v2 Announce Type: replace Abstract: Artificial students -- models that simulate how learners act and respond within educational systems -- are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, most existing approaches rely on prompting lar

#open-source #education #programming

arxivApr 18

Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

arXiv:2604.14174v1 Announce Type: cross Abstract: Alignment-tuned language models frequently suppress factual log-probabilities on politically sensitive topics despite retaining the knowledge in their hidden representations. We show that a 786K-parameter (approximately 0.02% of the base model) post-

#language models #adapter #censorship