Draft-OPD: On-Policy Distillation for Speculative Draft Models

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2605.29343v2 Announce Type: replace Abstract: Speculative decoding accelerates large language model inference by pairing a target model with a lightweight draft model whose proposed tokens are verified in parallel. A common way to build draft models, like EAGLE3 or DFlash is supervised fine-tu

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Related coverage

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Related coverage