·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing1h◆Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate1h◆Libra: Efficient Resource Management for Agentic RL Post-Training1h◆PianoKontext: Expressive Performance Rendering from Deadpan Context1h◆Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View1h◆Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection1h◆NetBurst: Event-Centric Forecasting of Bursty, Intermittent Time Series1h◆BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language1h◆Compatibility-Aware Dynamic Fine-Tuning for Large Language Models1h◆EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA1h◆Beyond Compaction: Structured Context Eviction for Long-Horizon Agents1h◆Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting1h◆3-Key-Input: Exploring the Theoretical Minimum Keys for Text Entry1h◆Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks1h◆Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling1h◆Neuron-based Personality Trait Induction in Large Language Models1h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions1h◆The Impossibility of Eliciting Latent Knowledge1h◆Noise-Guided Transport for Imitation Learning1h◆Mapping Scientific Literature with Large Language Models and Topic Modeling1h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing1h◆Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate1h◆Libra: Efficient Resource Management for Agentic RL Post-Training1h◆PianoKontext: Expressive Performance Rendering from Deadpan Context1h◆Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View1h◆Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection1h◆NetBurst: Event-Centric Forecasting of Bursty, Intermittent Time Series1h◆BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language1h◆Compatibility-Aware Dynamic Fine-Tuning for Large Language Models1h◆EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA1h◆Beyond Compaction: Structured Context Eviction for Long-Horizon Agents1h◆Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting1h◆3-Key-Input: Exploring the Theoretical Minimum Keys for Text Entry1h◆Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks1h◆Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling1h◆Neuron-based Personality Trait Induction in Large Language Models1h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions1h◆The Impossibility of Eliciting Latent Knowledge1h◆Noise-Guided Transport for Imitation Learning1h◆Mapping Scientific Literature with Large Language Models and Topic Modeling1h◆
News/Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
arxiv
PublishedJune 11, 2026 at 4:00 AM

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2606.12344v1 Announce Type: cross Abstract: General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction con

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivAnomalies in Multivariate Time Series Benchmarks Are Mostly Univariate1harxivLibra: Efficient Resource Management for Agentic RL Post-Training1harxivPianoKontext: Expressive Performance Rendering from Deadpan Context1harxivWhy Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View1h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews