OjaKV: Context-Aware Online Low-Rank KV Cache Compression

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2509.21623v2 Announce Type: replace-cross Abstract: The expanding long-context capabilities of large language models are constrained by a significant memory bottleneck: the key-value (KV) cache required for autoregressive generation. This bottleneck is substantial; for instance, a Llama-3.1-8B

Discussion

No replies yet. Be first.

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

Related coverage