arxiv
PublishedMay 13, 2026 at 4:00 AM
—neutral
Sequential Off-Policy Learning with Logarithmic Smoothing
Publisher summary· verbatim
arXiv:2506.10664v2 Announce Type: replace-cross Abstract: Off-policy learning enables training policies from logged interaction data. Most prior work considers the batch setting, where a policy is learned from data generated by a single behavior policy. In real systems, however, policies are updated
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivEnhancing Autonomous Online Intrusion Detection for IoT with Balanced Learning, Reliable Pseudo-Labels, and Lightweight Architectures9harxivPitchBench: Measuring Pitch Hearing in Audio-Language Models9harxivJobBench: Aligning Agent Work With Human Will9harxivAnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation9hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗