arxiv
PublishedMay 28, 2026 at 4:00 AM
Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows
Publisher summary· verbatim
arXiv:2605.27922v1 Announce Type: new Abstract: LLM agents are increasingly deployed as executable systems that use tools, modify workspaces, and produce concrete artifacts. In such workflows, performance depends not only on the base model, but also on the harness: the system layer that manages cont
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning2harxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning2harxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models2harxivTemporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents2hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗