arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
Publisher summary· verbatim
arXiv:2604.21611v1 Announce Type: cross Abstract: Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision, via Verbal Process Supervision (VPS), a training
Discussion
No replies yet. Be first.
Originally published on arxiv ↗