arxivApril 11, 2026 at 4:00 AM2 min readneutral

SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

View PDF HTML (experimental) Abstract:Large language models (LLMs) are probabilistic in nature and perform more reliably when augmented with external information. As complex queries often require multi-step reasoning over the retrieved information, with no clear or predetermined reasoning path, they remain challenging. Recent approaches train models using reinforcement learning on the model's outcome, showing promise in improving how models handle complex information. We introduce SubSearch, a specialized framework that shifts from outcome-only supervision to intermediate reward signals that incentivize planning high-quality reasoning. Unlike previous work on process reward modeling, which focuses on training a separate reward model with annotated trajectories by either human annotators or large LLM judges, SubSearch directly optimizes the generator using intrinsic process rewards, which we define as internally-derived rewards, eliminating the need for external supervision, and moving towards autonomous information-intensive reasoning. Experiments on seven benchmarks show that rewarding intermediate reasoning steps with intrinsic rewards leads to more robust reasoning traces in both QA and multi-hop QA datasets over using only outcome rewards. SubSearch can help in building reasoning traces that allow agents to better integrate search engines for complex query answering, while offering a data-efficient alternative to supervised process modeling. Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2604.07415 [cs.IR] (or arXiv:2604.07415v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2604.07415 arXiv-issued DOI via DataCite (pending registration) Submission history From: Roxana Petcu [view email] [v1] Wed, 8 Apr 2026 13:09:47 UTC (375 KB)

Read original article ↗

No replies yet. Be first.

arxiv1h ago

FluidFlow: a flow-matching generative model for fluid dynamics surrogates on unstructured meshes

arxiv1h ago

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

arxiv1h ago

SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

Related Articles

FluidFlow: a flow-matching generative model for fluid dynamics surrogates on unstructured meshes

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use