CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2502.03805v2 Announce Type: replace Abstract: Large language models have revolutionized natural language processing but face significant challenges of high storage and runtime costs, due to the transformer architecture's reliance on self-attention, particularly the large KV cache for long-sequ

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective

Related coverage

CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective

Related coverage