Adaptive Exploration for Latent-State Bandits

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2602.05139v3 Announce Type: replace Abstract: We study bandits whose rewards depend on an unobserved Markov state that evolves independently of the learner's actions. The optimal arm can change even though the learner observes only past actions and rewards. We propose algorithms that feed LinU

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Adaptive Exploration for Latent-State Bandits

Related coverage

Adaptive Exploration for Latent-State Bandits

Related coverage