arxivApril 13, 2026 at 4:00 AM1 min read
Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling
arXiv:2604.09159v1 Announce Type: new Abstract: Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distrib
No replies yet. Be first.