LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2605.30434v1 Announce Type: cross Abstract: Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Related coverage

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Related coverage