One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2605.23652v1 Announce Type: new Abstract: On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho approx 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline. Life simul

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Related coverage

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Related coverage