arxiv
PublishedJune 19, 2026 at 4:00 AM
—neutral
Temporal Self-Imitation Learning
Publisher summary· verbatim
arXiv:2606.19752v1 Announce Type: cross Abstract: Long-horizon robot manipulation policies trained with reward shaping can still exploit dense rewards through inefficient interaction, while rare efficient behaviors may be forgotten during training. We argue that temporal efficiency itself provides a
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivBeyond Accuracy: Measuring Logical Compliance of Predictive Models1harxivA Multi-Agent system for Multi-Objective constrained optimization1harxivHuman-AI Agent Interaction in a Business Context1harxivInterpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement1hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗