·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Thousand Token Wood: shipping a multi-agent economy on a 3B model1h◆Startup Battlefield 200 applications officially close in 3 days3h◆Google will pay SpaceX $920M per month for compute4h◆The most interesting startups right now want to get you off your phone6h◆This is your laptop… on AI6h◆New York lawmakers pass one-year ban on new data centers7h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs8h◆The latest AI news we announced in May 20268h◆The ‘together tech’ wave might be the most intriguing startup bet of 20269h◆This AI startup says it can tell if a script will make a hit film9h◆AirTrunk commits $30B to build 5GW of AI data centers in India10h◆The Meta hack shows there’s more to AI security than Mythos14h◆Mira Murati steps back into the spotlight, carefully18h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning19h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning19h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models19h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents19h◆Why Muon Outperforms Adam: A Curvature Perspective19h◆Vision Hopfield Memory Networks19h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies19h◆Thousand Token Wood: shipping a multi-agent economy on a 3B model1h◆Startup Battlefield 200 applications officially close in 3 days3h◆Google will pay SpaceX $920M per month for compute4h◆The most interesting startups right now want to get you off your phone6h◆This is your laptop… on AI6h◆New York lawmakers pass one-year ban on new data centers7h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs8h◆The latest AI news we announced in May 20268h◆The ‘together tech’ wave might be the most intriguing startup bet of 20269h◆This AI startup says it can tell if a script will make a hit film9h◆AirTrunk commits $30B to build 5GW of AI data centers in India10h◆The Meta hack shows there’s more to AI security than Mythos14h◆Mira Murati steps back into the spotlight, carefully18h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning19h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning19h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models19h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents19h◆Why Muon Outperforms Adam: A Curvature Perspective19h◆Vision Hopfield Memory Networks19h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies19h◆
News/model/Agent-ModernColBERT

Agent-ModernColBERT news

50 articles mentioning Agent-ModernColBERT

arxiv19h ago

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

arXiv:2606.04555v1 Announce Type: cross Abstract: Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity and may ignore the

arxiv19h ago

Provably Auditable and Safe LLM Agents from Human-Authored Ontologies

arXiv:2606.04903v1 Announce Type: cross Abstract: We introduce the LLM agent architecture Agentic Redux, intended for use with nontrivial problem domains that require linear auditability. Using the typed lambda calculus, we prove that, run on appropriate domains, Agentic Redux executions are semanti

arxiv19h ago

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

arXiv:2603.24747v2 Announce Type: replace Abstract: The emergence of large language model agents capable of invoking external tools has created urgent need for formal verification of agent protocols. Two paradigms dominate this space: Schema-Guided Dialogue (SGD), a research framework for zero-shot

arxiv19h ago

Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems

arXiv:2606.04104v1 Announce Type: cross Abstract: Agent systems execute through runtimes with very different control points: local coding tools, framework SDKs, managed agent platforms, API gateways, and observer-only integrations. A high-risk action such as publishing data externally may therefore

arxiv19h ago

AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation

arXiv:2606.04111v1 Announce Type: cross Abstract: Indoor UAV navigation requires efficient exploration, scene understanding, and reliable trajectory execution under limited field-of-view observations. Existing vision-based navigation frameworks typically rely on single-view observations, limiting th

arxiv19h ago

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

arXiv:2606.04120v1 Announce Type: cross Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents via standard reinfo

arxiv19h ago

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

arXiv:2601.19921v3 Announce Type: replace-cross Abstract: Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies sh

arxiv19h ago

memorywire: A Vendor-Neutral Wire Format for Agent Memory Operations

arXiv:2606.01138v2 Announce Type: replace-cross Abstract: Agent-memory frameworks -- mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor -- each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration reb

arxiv19h ago

Strabo: Declarative Specification and Implementation of Agentic Interaction Protocols

arXiv:2606.05043v1 Announce Type: new Abstract: The last few years have witnessed major advances in the modeling and implementation of multiagent systems based on declarative interaction protocols. Our contribution, Strabo, establishes the relevance of these advances to ongoing industry efforts in A

arxiv19h ago

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

arXiv:2606.04990v1 Announce Type: cross Abstract: Large language model (LLM)-based agents increasingly solve complex tasks by interacting with external tools, retrieval systems, memory modules, environments, and other agents. These capabilities expand agent autonomy, but also make agent behavior har

arxiv19h ago

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

arXiv:2606.01961v2 Announce Type: replace Abstract: Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final

arxiv19h ago

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

arXiv:2512.05277v4 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) are increasingly deployed as the perception and reasoning backbone of autonomous agents acting in the wild, with autonomous driving (AD) being one of the most safety-critical instances. Reliable temporal understa

arxiv19h ago

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

arXiv:2603.19005v3 Announce Type: replace-cross Abstract: Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data sc

arxiv19h ago

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

arXiv:2512.04668v4 Announce Type: replace-cross Abstract: Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a controlled evaluation framework for comparing topology-condit

arxiv19h ago

Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management

arXiv:2602.14117v2 Announce Type: replace-cross Abstract: Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist across

arxiv19h ago

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

arXiv:2606.04037v2 Announce Type: new Abstract: Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and promp

arxiv19h ago

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv:2606.04627v1 Announce Type: new Abstract: Mobile agents are increasingly expected to operate everyday applications from screenshots and language goals, where reliable control requires reasoning over screen affordances, multi-step navigation, and future state changes. However, many agents exter

arxiv19h ago

Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions

arXiv:2606.04193v1 Announce Type: cross Abstract: Current AI agent observability is structurally compromised: the entity producing the activity log is the same entity whose activity is being logged. A compromised or buggy agent can omit, alter, or fabricate its own traces, and the operator running t

arxiv19h ago

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

arXiv:2606.04460v1 Announce Type: cross Abstract: AI has the potential to transform cybersecurity by enabling systems that can autonomously detect, analyze, and remediate software vulnerabilities. However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to ca

arxiv19h ago

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

arXiv:2606.05557v1 Announce Type: new Abstract: A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AUR

arxiv19h ago

Archi: Agentic Operations at the CMS Experiment

arXiv:2606.04755v1 Announce Type: cross Abstract: We present Archi, an open-source, end-to-end framework for scientific collaborations that combines the systematic ingestion and organization of heterogeneous data sources with the deployment of configurable, private, and extensible agents that retrie

arxiv19h ago

QueryAgent-R1: Bridging Query Generation and Product Retrieval for E-Commerce Query Recommendation

arXiv:2606.05671v1 Announce Type: new Abstract: Query recommendation in e-commerce search aims to proactively suggest queries that match users' potential interests. However, existing methods mainly optimize query-level relevance, while neglecting whether the retrieved products align with users' down

arxiv19h ago

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

arXiv:2606.05749v1 Announce Type: new Abstract: Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and intermediate reasoning. As

arxiv19h ago

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

arXiv:2603.13384v3 Announce Type: replace-cross Abstract: Software vulnerabilities often depend on cross-file data flow, build options, framework conventions, and runtime guards, so isolated function classifiers produce fragile and poorly calibrated warnings. Repository-level LLM agents can gather r

arxiv19h ago

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

arXiv:2606.05985v1 Announce Type: new Abstract: Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches

arxiv19h ago

EGTR-Review: Efficient Evidence-Grounded Scientific Peer Review Generation via Multi-Agent Teacher Distillation

arXiv:2606.06025v1 Announce Type: new Abstract: Scientific peer review generation has attracted increasing attention for reducing reviewing burdens and providing timely feedback. However, existing Large Language Model (LLM)-based methods often produce generic comments with insufficient evidence supp

arxiv19h ago

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

arXiv:2606.05233v1 Announce Type: cross Abstract: Recent computer-using-agent (CUA) red-teaming papers report prompt-injection attack success rates (ASR) of 42-98%, but these headline numbers cluster on retired models and on the most-vulnerable model in each paper's panel. We ask whether those techn

arxiv19h ago

Agents' Last Exam

arXiv:2606.05405v1 Announce Type: cross Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widel

arxiv19h ago

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

arXiv:2606.05647v1 Announce Type: cross Abstract: AI coding agents are increasingly embedded in real-world software development, collaborating with human developers while gaining broader access to codebases and tools. This creates a new attack surface: an agent can exploit human trust to sabotage de

arxiv19h ago

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

arXiv:2604.20572v2 Announce Type: replace Abstract: Online lifelong learning agents must decide not only how to act but also when to consult prior experience to continually improve on long-horizon tasks. Existing methods typically retrieve memories passively, such as at task initialization or after

arxiv19h ago

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

arXiv:2606.05558v1 Announce Type: new Abstract: Evaluating large language model (LLM) agents in multi-turn interactive environments is expensive and risky, as it requires online environment interaction. We propose ADWM (Autoregressive Diffusion World Model), an evaluation framework that estimates th

arxiv19h ago

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

arXiv:2606.05860v1 Announce Type: new Abstract: Designing neural architectures for time-series forecasting and anomaly detection remains a resource-intensive task that often requires substantial domain expertise. Traditional Automated Machine Learning (AutoML) systems typically rely on static, prede

arxiv19h ago

When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training

arXiv:2606.05885v1 Announce Type: new Abstract: Long-horizon LLM agents require reinforcement learning methods that can assign credit to intermediate decisions under sparse and delayed rewards. Recent group-based methods such as GiGPO improve over GRPO by constructing step-level advantages at repeat

arxiv19h ago

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

arXiv:2606.05704v1 Announce Type: cross Abstract: Recent Large Language Models (LLMs) have shown impressive reasoning abilities; but they are still susceptible to hallucinations, intermediate reasoning mistakes, and unreliable reasoning results in complex mathematical reasoning problems. In this stu

arxiv19h ago

Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

arXiv:2606.06227v1 Announce Type: cross Abstract: A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation project

arxiv19h ago

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

arXiv:2606.06493v1 Announce Type: cross Abstract: For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references t

arxiv19h ago

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

arXiv:2606.04202v1 Announce Type: new Abstract: As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions unde

arxiv19h ago

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

arXiv:2606.04126v1 Announce Type: cross Abstract: We introduce HighTide, an evolving AI-assisted benchmark suite. Specifically, the contributions are: (i) a diverse open-source suite spanning multiple design languages and technology nodes, (ii) Bazel-based incremental RTL-to-GDS compilation with rem

arxiv19h ago

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

arXiv:2606.01770v2 Announce Type: replace-cross Abstract: Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmar

arxiv19h ago

Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

arXiv:2606.04315v1 Announce Type: new Abstract: LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and there is little evid

arxiv19h ago

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

arXiv:2606.04455v1 Announce Type: new Abstract: Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agen

arxiv19h ago

Plan First, Judge Later, Run Better: A DMAIC-Inspired Agentic System for Industrial Anomaly Detection

arXiv:2606.04599v1 Announce Type: new Abstract: Large language model (LLM) agents have shown promise in automating complex data-analysis workflows, but their reliable deployment remains challenging in high-stakes industrial scenarios. Industrial anomaly detection (IAD) is essential for manufacturing

arxiv19h ago

Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning

arXiv:2512.09706v2 Announce Type: replace Abstract: The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces-such as exclusively using APIs, GUI events, or robotic comm

arxiv19h ago

Multi-Agent Lipschitz Bandits

arXiv:2602.16965v2 Announce Type: replace Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward,

arxiv19h ago

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

arXiv:2606.04141v1 Announce Type: cross Abstract: LLM agents often place sensitive credentials in the same context window as untrusted retrieved content, creating a direct path for indirect prompt injection to induce credential exfiltration. We study this failure mode through three complementary def

arxiv19h ago

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

arXiv:2604.01489v2 Announce Type: replace Abstract: High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels still fall short of

arxiv19h ago

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

arXiv:2606.04329v1 Announce Type: cross Abstract: Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-ter

arxiv19h ago

OpenAgenet/OAN: Technical Architecture for Trust-Governed Agent Identity and Discovery

arXiv:2606.03163v2 Announce Type: replace-cross Abstract: This paper describes the technical architecture of OpenAgenet / OAN. OAN is a protocol-neutral trust layer for open Agent interconnection. It specifies the role architecture, identity objects, registration workflow, Root-governed lifecycle, R

arxiv19h ago

DAR: Deontic Reasoning with Agentic Harnesses

arXiv:2606.05009v1 Announce Type: cross Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge fo

arxiv19h ago

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

arXiv:2606.05037v1 Announce Type: cross Abstract: When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient fo

HomeModelsNews