arxivJun 2

Adaptive Exploration for Latent-State Bandits

arXiv:2602.05139v3 Announce Type: replace Abstract: We study bandits whose rewards depend on an unobserved Markov state that evolves independently of the learner's actions. The optimal arm can change even though the learner observes only past actions and rewards. We propose algorithms that feed LinU

LI1 model #bandits #machine-learning #markov-state Read on arxiv →

arxivMay 21

Batched Single-Index Global Multi-Armed Bandits with Covariates

arXiv:2503.00565v3 Announce Type: replace-cross Abstract: The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as perso

#machine-learning #bandits #recommendation-systems Read on arxiv →

arxivApr 3

Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

arXiv:2510.09908v3 Announce Type: replace-cross Abstract: The rise of large-scale pretrained models has made it feasible to generate predictive or synthetic features at low cost, raising the question of how to incorporate such surrogate predictions into downstream decision-making. We study this prob

#machine-learning #bandits #pretrained-models Read on arxiv →

Adaptive Exploration for Latent-State Bandits

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Batched Single-Index Global Multi-Armed Bandits with Covariates

Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation