arxiv
PublishedApril 27, 2026 at 4:00 AM
Concave Statistical Utility Maximization Bandits via Influence-Function Gradients
Publisher summary· verbatim
arXiv:2604.22140v1 Announce Type: cross Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables11harxivConsequentialist Objectives and Catastrophe11harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms11harxivReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation11hOriginally published on arxiv ↗