·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
GAGPO: Generalized Advantage Grouped Policy Optimization2h◆UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems2h◆Orchestra-o1: Omnimodal Agent Orchestration2h◆Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher2h◆WorkBench Revisited: Workplace Agents Two Years On2h◆Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP2h◆YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications2h◆MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis2h◆A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale2h◆Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL2h◆SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing2h◆CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning2h◆Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows2h◆An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment2h◆Aligning Quantum Operators with Large Language Models2h◆Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response2h◆When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation2h◆Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning2h◆Clay-CNN Hybrids: Leveraging Geo-Foundational Models as Auxiliary Context for Landslide Detection2h◆FEMOT: Multi-Object Tracking using Frame and Event Cameras2h◆GAGPO: Generalized Advantage Grouped Policy Optimization2h◆UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems2h◆Orchestra-o1: Omnimodal Agent Orchestration2h◆Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher2h◆WorkBench Revisited: Workplace Agents Two Years On2h◆Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP2h◆YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications2h◆MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis2h◆A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale2h◆Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL2h◆SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing2h◆CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning2h◆Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows2h◆An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment2h◆Aligning Quantum Operators with Large Language Models2h◆Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response2h◆When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation2h◆Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning2h◆Clay-CNN Hybrids: Leveraging Geo-Foundational Models as Auxiliary Context for Landslide Detection2h◆FEMOT: Multi-Object Tracking using Frame and Event Cameras2h◆
News/MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis
arxiv
PublishedJune 15, 2026 at 4:00 AM
—neutral

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2606.13782v1 Announce Type: new Abstract: Large Language Models (LLMs) have made notable progress in automated theorem proving, yet existing formal benchmarks remain limited in both mathematical coverage and difficulty. Most are concentrated in areas that are easier to formalize, such as algeb

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivGAGPO: Generalized Advantage Grouped Policy Optimization2harxivUP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems2harxivOrchestra-o1: Omnimodal Agent Orchestration2harxivHybrid Open-Ended Tri-Evolution Makes Better Deep Researcher2h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews