HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2601.14724v3 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models

Discussion

No replies yet. Be first.

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Related coverage