Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2603.03480v2 Announce Type: replace Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

Related coverage

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

Related coverage