Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2604.22161v1 Announce Type: new Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on conte

Discussion

No replies yet. Be first.

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Related coverage

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Related coverage