arxiv
PublishedJune 2, 2026 at 4:00 AM
—neutral
Adaptive Exploration for Latent-State Bandits
Publisher summary· verbatim
arXiv:2602.05139v3 Announce Type: replace Abstract: We study bandits whose rewards depend on an unobserved Markov state that evolves independently of the learner's actions. The optimal arm can change even though the learner observes only past actions and rewards. We propose algorithms that feed LinU
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFederatedSkill: Federated Learning for Agentic Skill Evolution8harxivToward a Modular Architecture for Embedded AI Agent Systems at the Edge8harxivA Graph Foundation Model with Spectral Parsing and Prototype-Guided Spatial Propagation8harxivAnomalies in Multivariate Time Series Benchmarks Are Mostly Univariate8hThe Bubble Brief
WEEKLYRead bandits insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗