arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation
Publisher summary· verbatim
arXiv:2604.16514v3 Announce Type: replace-cross Abstract: Autoregressive vision-language models (VLMs) deliver strong multimodal capability, but their token-by-token decoding imposes a fundamental inference bottleneck. Diffusion VLMs offer a more parallel decoding paradigm, yet directly converting a
Discussion
No replies yet. Be first.
Originally published on arxiv ↗