Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2604.22229v1 Announce Type: cross Abstract: One-step offline RL actors are attractive because they avoid backpropagating through long iterative samplers and keep inference cheap, but they still have to improve under a critic without drifting away from actions that the dataset can support. In r

Discussion

No replies yet. Be first.

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Related coverage

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Related coverage