·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Thousand Token Wood: shipping a multi-agent economy on a 3B model2h◆Startup Battlefield 200 applications officially close in 3 days4h◆Google will pay SpaceX $920M per month for compute5h◆The most interesting startups right now want to get you off your phone7h◆This is your laptop… on AI8h◆New York lawmakers pass one-year ban on new data centers9h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs10h◆The latest AI news we announced in May 202610h◆The ‘together tech’ wave might be the most intriguing startup bet of 202610h◆This AI startup says it can tell if a script will make a hit film10h◆AirTrunk commits $30B to build 5GW of AI data centers in India11h◆The Meta hack shows there’s more to AI security than Mythos15h◆Mira Murati steps back into the spotlight, carefully19h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning20h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning20h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models20h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents20h◆Why Muon Outperforms Adam: A Curvature Perspective20h◆Vision Hopfield Memory Networks20h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies20h◆Thousand Token Wood: shipping a multi-agent economy on a 3B model2h◆Startup Battlefield 200 applications officially close in 3 days4h◆Google will pay SpaceX $920M per month for compute5h◆The most interesting startups right now want to get you off your phone7h◆This is your laptop… on AI8h◆New York lawmakers pass one-year ban on new data centers9h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs10h◆The latest AI news we announced in May 202610h◆The ‘together tech’ wave might be the most intriguing startup bet of 202610h◆This AI startup says it can tell if a script will make a hit film10h◆AirTrunk commits $30B to build 5GW of AI data centers in India11h◆The Meta hack shows there’s more to AI security than Mythos15h◆Mira Murati steps back into the spotlight, carefully19h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning20h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning20h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models20h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents20h◆Why Muon Outperforms Adam: A Curvature Perspective20h◆Vision Hopfield Memory Networks20h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies20h◆
Tag

#software engineering

4 articles tagged #software engineering

arxivMay 11

The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking

arXiv:2605.06707v1 Announce Type: cross Abstract: This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, G

GPGEGR4 models · +1#software engineering#artificial intelligence#benchmarkRead on arxiv →
arxivMay 1bearish

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

arXiv:2604.28139v1 Announce Type: cross Abstract: LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult

#benchmark#workflow#evaluationRead on arxiv →
arxivApr 24bullish

DryRUN: On the Role of Public Tests in LLM-Driven Code Generation

arXiv:2604.21598v1 Announce Type: cross Abstract: Multi-agent frameworks are widely used in autonomous code generation and have applications in complex algorithmic problem-solving. Recent work has addressed the challenge of generating functionally correct code by incorporating simulation-driven plan

DRCO2 models#autonomous code generation#software engineering#large language modelsRead on arxiv →
arxivApr 16bullish

AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

arXiv:2604.11950v1 Announce Type: cross Abstract: While recent LLM-based agents can identify many candidate bugs in source code, their reports remain static hypotheses that require manual validation, limiting the practicality of automated bug detection. We frame this challenge as a test generation t

CLCO2 models#software engineering#bug detection#test generationRead on arxiv →
HomeModelsNews