Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2606.03087v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves the ability of large language model, yet headline accuracy gains often conceal a hidden cost: previously solved problems quietly become unsolvable as training proceeds. We frame this phenom

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

Related coverage

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

Related coverage