arxiv
PublishedMay 28, 2026 at 4:00 AM
Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code
Publisher summary· verbatim
arXiv:2603.24631v2 Announce Type: replace-cross Abstract: Code agents resolve 65-70% of SWE-bench Verified issues, but Pass@1 cannot tell us why the rest fail, and, as we show, capable-model failures are systematically misdiagnosed without trajectory data. We introduce TRAJEVAL, a training-free deco
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivEDEN: A Large-Scale Corpus of Clinical Notes for Italian1darxivASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection1darxivLoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling1darxivAPPO: Agentic Procedural Policy Optimization1dThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗