arxiv
PublishedJune 18, 2026 at 4:00 AM
Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
Publisher summary· verbatim
arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivEffects of sparsity and superposition on loss in simple autoencoders9harxivBridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies9harxivEnsuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED9harxivPyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising9hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗