arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
NPU Design for Diffusion Language Model Inference
Publisher summary· verbatim
arXiv:2601.20706v2 Announce Type: replace-cross Abstract: Diffusion-based LLMs (dLLMs) fundamentally depart from traditional autoregressive (AR) LLM inference: they leverage bidirectional attention, block-wise KV cache refreshing, cross-step reuse, and a non-GEMM-centric sampling phase. These charac
Discussion
No replies yet. Be first.
Originally published on arxiv ↗