Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
View PDF HTML (experimental) Abstract:Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines. Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO) Cite as: arXiv:2604.08718 [cs.CV] (or arXiv:2604.08718v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2604.08718 arXiv-issued DOI via DataCite (pending registration) Submission history From: Xinmiao Xiong [view email] [v1] Thu, 9 Apr 2026 19:12:37 UTC (2,720 KB)
No replies yet. Be first.