·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing4h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning4h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!4h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions4h◆The Impossibility of Eliciting Latent Knowledge4h◆Mapping Scientific Literature with Large Language Models and Topic Modeling4h◆Grounding Computer Use Agents on Human Demonstrations4h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models4h◆Implicit Neural Representations of Individual Behavior4h◆LSTM based IoT Device Identification4h◆StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse4h◆UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA4h◆Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria4h◆Composing Linear Layers from Irreducibles4h◆Visualizing LLM Latent Space Geometry Through Dimensionality Reduction4h◆Robustness of Mixtures of Experts to Feature Noise4h◆Breaking the Ice: Analyzing Cold Start Latency in vLLM4h◆MPC-Patch-Bench: Security-Aware LLM Code Patch for Multi-Party Computation4h◆Pretrained self-supervised speech models can recognize unseen consonants4h◆BioMamba: Domain-Adaptive Biomedical Language Models4h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing4h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning4h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!4h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions4h◆The Impossibility of Eliciting Latent Knowledge4h◆Mapping Scientific Literature with Large Language Models and Topic Modeling4h◆Grounding Computer Use Agents on Human Demonstrations4h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models4h◆Implicit Neural Representations of Individual Behavior4h◆LSTM based IoT Device Identification4h◆StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse4h◆UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA4h◆Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria4h◆Composing Linear Layers from Irreducibles4h◆Visualizing LLM Latent Space Geometry Through Dimensionality Reduction4h◆Robustness of Mixtures of Experts to Feature Noise4h◆Breaking the Ice: Analyzing Cold Start Latency in vLLM4h◆MPC-Patch-Bench: Security-Aware LLM Code Patch for Multi-Party Computation4h◆Pretrained self-supervised speech models can recognize unseen consonants4h◆BioMamba: Domain-Adaptive Biomedical Language Models4h◆
News/AI benchmarks are broken. Here’s what we need instead.
mit-tech-review
PublishedMarch 31, 2026 at 12:01 PM

AI benchmarks are broken. Here’s what we need instead.

Source
technologyreview.comfull article ↗
Read on mit-tech-review→
Publisher summary· verbatim

For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from coding to essay writing, the performance of AI models and applications is tested against that of individual humans completing tasks. This framing is s

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
mit-tech-review
Read original ↗All from mit-tech-review →

No replies yet. Be first.

Source
↗
mit-tech-review
Read original ↗All from mit-tech-review →
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on mit-tech-review ↗
HomeModelsNews