arxivApr 17
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
arXiv:2601.14724v3 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models