GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2601.09361v3 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a key paradigm for improving large-scale reasoning models. Unlike supervised fine-tuning (SFT), RLVR exhibits distinct optimization dynamics and is sensitive to the preservation of pre-

Discussion

No replies yet. Be first.

Originally published on arxiv ↗