arxiv
PublishedJune 19, 2026 at 4:00 AM
NEST: Narrative Event Structures in Time for Long Video Understanding
Publisher summary· verbatim
arXiv:2606.19706v1 Announce Type: cross Abstract: Recent progress in vision-language models has enabled the processing of increasingly long video sequences, but the ability to handle extended token streams does not translate to understanding of narrative structure in long videos. Existing long video
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivHuman-AI Agent Interaction in a Business Context3harxivAI4SE and SE4AI Exploration: A Decade Looking Back and Forward3harxivExit-and-Join Dynamics for Decentralized Coalition Formation3harxivApparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact3hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗