arxivMarch 30, 2026 at 4:00 AM1 min read
GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding
arXiv:2603.25841v1 Announce Type: cross Abstract: Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We introduce GazeQwen, a parameter efficient approach
No replies yet. Be first.