arxiv
PublishedJune 19, 2026 at 4:00 AM
Too long; didn't solve
Publisher summary· verbatim
arXiv:2604.07593v2 Announce Type: replace Abstract: Mathematical benchmarks consisting of a range of mathematics problems are widely used to evaluate the reasoning abilities of large language models, yet little is known about how their structural properties influence model behaviour. In this work, w
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivHuman-AI Agent Interaction in a Business Context2harxivMeasuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS20232harxivREVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk2harxivEmergent Alignment2hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗