arxivJul 11bullish

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States

arXiv:2607.01223v3 Announce Type: replace-cross Abstract: When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the fact and are

TH1 model #verification #proof #auditability Read on arxiv →

arxivJul 3bullish

Evergreen: Efficient Claim Verification for Semantic Aggregates

arXiv:2604.26180v2 Announce Type: replace-cross Abstract: With recent semantic query processing engines, semantic aggregation has become a primitive operator, enabling the reduction of a relation into a natural language aggregate using an LLM. However, the resulting semantic aggregate may contain cl

LL1 model #databases #optimization #verification Read on arxiv →

arxivJun 6

Zero knowledge verification for frontier AI training is possible

arXiv:2606.05433v1 Announce Type: new Abstract: Frontier AI governance frameworks increasingly use cumulative training compute as the primary criterion for designating high-impact models, but enforcement rests on self-reporting because no technical verification primitive for training exists. Any fut

#governance #verification #zero-knowledge Read on arxiv →

arxivMay 29bullish

Formalizing Mathematics at Scale

arXiv:2605.29955v1 Announce Type: new Abstract: We present AutoformBot, a multi-agent system for building an Autoformalized Textbook Library At Scale (Atlas) in Lean 4. AutoformBot orchestrates thousands of LLM agents, equipped with formal verification tools, dependency-aware task scheduling, and co

AU1 model #autoformalization #mathematics #verification Read on arxiv →

arxivMay 25

Lipschitz Optimization for Formal Verification of Homographies

arXiv:2605.23203v1 Announce Type: cross Abstract: The adoption of vision neural networks in regulated industries requires formal robustness guarantees, especially in safety-critical domains such as healthcare, autonomous vehicles, and aerospace. However, current approaches are confined to incomplete

#computer-vision #safety #verification Read on arxiv →

arxivMay 16

Monitoring Data-aware Temporal Properties (Extended Version)

arXiv:2605.14666v1 Announce Type: new Abstract: Dynamic systems in AI are often complex and heterogeneous, so that an internal specification is not accessible and verification techniques such as model checking are not applicable. Monitoring is in such cases an attractive alternative, as it evaluates

#monitoring #verification #artificial-intelligence Read on arxiv →

arxivMay 11bullish

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

arXiv:2605.07935v1 Announce Type: new Abstract: We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordinatio

#verification #multiagent #coordination Read on arxiv →

techcrunchApr 17bullish

Sam Altman’s project World looks to scale its human verification empire. First stop: Tinder.

World, which has raised eyebrows (but also a lot of interest) with its Orb-centered anonymous verification project, is looking to expand its influence via a bevy of new partnerships.

#verification #identity #dating-apps Read on techcrunch →

arxivApr 16

Variation in Verification: Understanding Verification Dynamics in Large Language Models

arXiv:2509.17995v2 Announce Type: replace-cross Abstract: Recent advances have shown that scaling test-time computation enables large language models (LLMs) to solve increasingly complex problems across diverse domains. One effective paradigm for test-time scaling (TTS) involves LLM generators produ

GPGEGE3 models #test-time-scaling #language-models #verification Read on arxiv →

arxivApr 10bullish

The Art of Building Verifiers for Computer Use Agents

arXiv:2604.06240v1 Announce Type: cross Abstract: Verifying the success of computer use agent (CUA) trajectories is a critical challenge: without reliable verification, neither evaluation nor training signal can be trusted. In this paper, we present lessons learned from building a best-in-class veri

UNWEWE3 models #verification #evaluation #artificial-intelligence Read on arxiv →