SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure.
SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.
Comments on the research have been received, with the work being accepted to the ACL2026 main conference. The research falls under the subjects of Artificial Intelligence (cs.AI) and Computation and Language (cs.CL), and can be cited as arXiv:2604.07922 [cs.AI] or arXiv:2604.07922v1 [cs.AI] for this version, with a DOI of https://doi.org/10.48550/arXiv.2604.07922, which is an arXiv-issued DOI via DataCite, pending registration. The submission history of the research includes the initial submission by Weiyang Huang on Thu, 9 Apr 2026 07:44:25 UTC, with a file size of 3,380 KB.
No replies yet. Be first.