arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
Publisher summary· verbatim
arXiv:2601.09361v3 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a key paradigm for improving large-scale reasoning models. Unlike supervised fine-tuning (SFT), RLVR exhibits distinct optimization dynamics and is sensitive to the preservation of pre-
Discussion
No replies yet. Be first.
Originally published on arxiv ↗