From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2512.05277v4 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) are increasingly deployed as the perception and reasoning backbone of autonomous agents acting in the wild, with autonomous driving (AD) being one of the most safety-critical instances. Reliable temporal understa

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

Related coverage

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

Related coverage