arxiv
PublishedApril 17, 2026 at 4:00 AM
—neutral
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Publisher summary· verbatim
arXiv:2601.14724v3 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables10harxivConsequentialist Objectives and Catastrophe10harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms10harxivA general optimization solver based on OP-to-MaxSAT reduction10hOriginally published on arxiv ↗