arxiv
PublishedApril 27, 2026 at 4:00 AM
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
Publisher summary· verbatim
arXiv:2604.00813v3 Announce Type: replace-cross Abstract: End-to-end autonomous driving has evolved from the conventional paradigm based on sparse perception into vision-language-action (VLA) models, which focus on learning language descriptions as an auxiliary task to facilitate planning. In this p
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables11harxivConsequentialist Objectives and Catastrophe11harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms11harxivReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation11hOriginally published on arxiv ↗