Co-Evolution of Policy and Internal Reward for Language Agents - Databubble