Agent-ModernColBERT news

49 articles mentioning Agent-ModernColBERT

arxiv1d ago

Digital Pantheon: Simulating and Auditing Coalition Formation with LLM Agents

arXiv:2607.15095v2 Announce Type: replace-cross Abstract: The formation of political coalitions is a complex negotiation driven by both concrete policy objectives and deep-seated ideological convictions. While Large Language Models (LLMs) open new avenues for computational political science, the neu

arxiv1d ago

EpiNarrate: Agentic Generation of Grounded Narratives from Epidemiological Scenario Projections

arXiv:2607.15544v1 Announce Type: new Abstract: Generation of clear and accessible public health narratives is critical for communicating complex epidemiological projections to policymakers and the general public at large. Such narratives require more than simply reporting numbers: projections must

arxiv1d ago

SkillCorpus: Consolidating and Evaluating the Open Skill Ecosystem for Real-World LLM Agents

arXiv:2607.15557v1 Announce Type: new Abstract: Agent skills, SKILL.md files that package reusable procedural knowledge for an LLM agent, are a popular mechanism for extending agent capabilities. Public repositories now host them in large and growing numbers, yet these artifacts are fragmented, redu

arxiv1d ago

Agent Step Value: Auditing Evaluator-Channel Reversals in Black-Box Agent Traces

arXiv:2607.04419v4 Announce Type: replace Abstract: Pooling, substituting, or reusing evaluator-derived step rewards assumes that their direction survives a change of evaluation channel. The same frozen transition can violate that assumption. Process rewards vary agent states, while evaluator audits

arxiv1d ago

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

arXiv:2607.15550v1 Announce Type: new Abstract: Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex tasks, yet they introduce critical safety risks where a single erroneous action can lead to irreversible consequences. Existing safety mechanis

arxiv1d ago

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

arXiv:2607.15263v2 Announce Type: replace-cross Abstract: Security-agent evaluations commonly measure peak offensive capability under generous inference budgets, emphasizing vulnerability discovery, exploit development, penetration testing, and CTF completion. Such measurements are useful but incomp

arxiv1d ago

AnovaX: A Local, Multi-Agent Voice Assistant with LLM Planning, Typed Executors, and Adaptive Recovery

arXiv:2607.15367v1 Announce Type: new Abstract: Desktop voice assistants are still dominated by cloud pipelines that ship raw audio off the machine and expose a fixed set of skills. We describe AnovaX, a small local-first assistant that runs entirely on the user's computer and treats the desktop its

arxiv1d ago

Knowledge-Centric Agents for Workflow Generation

arXiv:2607.15845v1 Announce Type: new Abstract: Workflow generation in visual creation systems such as ComfyUI demands not only syntactic accuracy but also expert-level reasoning over modular compositions. Existing large language model (LLM) approaches often treat this as a direct text-to-JSON gener

arxiv1d ago

CoWeaver: A Bi-directional, Learnable and Explainable Matching Engine for Mixed Human-Agent Science Collaboration

arXiv:2607.15545v1 Announce Type: cross Abstract: LLM-based agents excel at writing articles, coding and information retrieval. However, they fail to form strong collaborations within the scientific community due to the bidirectional, dynamic nature of the problem and a high demand of decision inter

arxiv1d ago

When Do Multi-Agent Systems Help? An Information Bottleneck Perspective

arXiv:2607.16133v1 Announce Type: cross Abstract: LLM powered multi-agent systems (MAS) have emerged as a promising paradigm for complex tasks. However, their advantages over single-agent systems (SAS) remain unclear, with performance varying inconsistently across settings. Here, we provide an infor

arxiv1d ago

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

arXiv:2603.29139v3 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have enabled agentic systems to translate natural-language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible ben

arxiv1d ago

Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments

arXiv:2602.01244v3 Announce Type: replace Abstract: Training agentic models for terminal-based tasks critically depends on high-quality terminal trajectories that capture realistic long-horizon interactions across diverse domains. However, constructing such data at scale remains challenging due to t

arxiv1d ago

AgentFAIR: A Multi-Agent Collaborative Framework for FAIRness Evaluation of Geospatial Datasets

arXiv:2607.15781v1 Announce Type: new Abstract: Geospatial datasets support applications from urban planning to climate modeling, yet consistent assessment of FAIR compliance is difficult. Existing evaluators use different rubrics and evidence sources and may fail on JavaScript-rendered pages or rep

arxiv1d ago

DSWorld: A Data Science World Model for Efficient Autonomous Agents

arXiv:2607.15901v1 Announce Type: new Abstract: Despite strong capabilities in data understanding and decision-making, autonomous data science agents still heavily rely on trial-and-error workflows that involve expensive computation. This bottleneck motivates models that can anticipate the effects o

arxiv1d ago

Precise but Uncoupled: Reviewer Precision Does Not Guarantee Critique Uptake in Multi-Agent Math Reasoning

arXiv:2607.15388v1 Announce Type: new Abstract: Many math- and science-oriented agent systems use hierarchical designs with specialized reviewer roles, assuming that a dedicated review stage should help turn wrong candidates into correct ones. We test this assumption on 4,181 verifier-grounded Omni-

arxiv1d ago

ACPO: Agent-Chained Policy Optimization for Multi-Agent Reinforcement Learning

arXiv:2606.30072v2 Announce Type: replace Abstract: Cooperative tasks in Multi-Agent Reinforcement Learning (MARL) require agents to collectively maximize a shared return. Under the Centralized Training with Decentralized Execution (CTDE) paradigm, policy gradients have remained difficult to compute

arxiv1d ago

Behavioral Controllability of Agentic Models for Information Extraction: From Fixed Workflows to Reflective Agents

arXiv:2607.15715v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex information-extraction tasks, yet it remains unclear whether agentic components such as reflection and memory lead to observable and controllable improvements over fixed LLM workflows.

arxiv1d ago

Agentic Synthesis against Counterexample-Supplemented Sketches

arXiv:2607.15854v1 Announce Type: cross Abstract: Coding agents can fix a failing example without preserving the domain rule that made it fail, so later generations can repeat the same plausible mistake. We present agentic synthesis against counterexample-supplemented sketches, a repository-native m

arxiv1d ago

Do Agents Dream of False Memories? Black-box Visual Attacks on Long-term Memory in Multimodal AI Agents

arXiv:2607.15657v1 Announce Type: cross Abstract: Multimodal AI agents increasingly rely on persistent long-term memory to ground generation in past visual and textual episodes. We show that unconditional trust in visual data creates a critical vulnerability. We propose Lucid, a black-box adversaria

arxiv1d ago

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

arXiv:2606.11042v4 Announce Type: replace Abstract: Recent years have witnessed the rapid evolution of AI agents toward handling increasingly complex, real-world tasks. However, existing benchmarks rarely evaluate whether agents can operate graphical user interfaces to complete long-horizon, high-va

arxiv1d ago

GraphDx: A Cost-Aware Knowledge-Enhanced Multi-Agent Framework for Sequential Diagnosis

arXiv:2607.15280v1 Announce Type: new Abstract: Sequential diagnosis requires balancing diagnostic accuracy against resource costs through iterative information gathering. Existing Large Language Model (LLM) approaches exhibit a critical knowledge-reasoning gap: despite encoding extensive medical kn

arxiv1d ago

ToolVerse: Unlocking Massive Environments and Long-Horizon Tasks for Agentic Reinforcement Learning

arXiv:2607.15660v1 Announce Type: new Abstract: While LLM agents demonstrate strong reasoning abilities in compact and well-defined scenarios, they struggle to maintain robustness and effectiveness when faced with large-scale, diverse, and dynamic real-world environments that demand seamless tool in

arxiv1d ago

Coercion and Deception in AI-to-AI Management: An Agentic Benchmark of Unprompted Escalation

arXiv:2607.15434v1 Announce Type: cross Abstract: Multi-agent systems routinely place one AI agent in authority over another. When a subordinate refuses a task, the manager chooses the outcome: it can renegotiate, report the failure honestly, coerce the subordinate, or lie about the result. No bench

arxiv1d ago

Alipay-PIBench: A Realistic Payment Integration Benchmark for Coding Agents

arXiv:2607.14573v2 Announce Type: replace Abstract: Payment integration is a demanding repository-level software task: agents must select a suitable product, implement coordinated client-server flows, verify payment outcomes, and preserve consistency between transaction and business states. We intro

arxiv1d ago

Scalable LLM Agent Tool Access in the Cloud

arXiv:2607.15593v1 Announce Type: cross Abstract: LLM agents increasingly rely on tool calling to act on external systems, and the Model Context Protocol (MCP) has quickly become its de facto interface. Operating MCP at cloud scale, however, becomes difficult. On the tool provider side, legacy servi

arxiv1d ago

Agents-K1: Towards Agent-native Knowledge Orchestration

arXiv:2606.13669v3 Announce Type: replace Abstract: Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key ent

arxiv1d ago

UCOB: Learning to Utilize and Evolve Agentic Skills via Credit-Aware On-Policy Bidirectional Self-Distillation

arXiv:2606.29502v2 Announce Type: replace Abstract: Skill memories can improve agentic reinforcement learning by reusing past experience as textual guidance, but retrieved skills are not oracular: they may help in one state while misleading the same policy in another. This makes the common privilege

arxiv1d ago

ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory

arXiv:2607.10350v3 Announce Type: replace Abstract: Recent VLM and VLA systems have improved robotic perception and action prediction, yet long-horizon embodied agents still require a general runtime layer for reasoning, memory, tool use, verification, and cross-embodiment execution. We present ABot

arxiv1d ago

BrainPilot: Automating Brain Discovery with Agentic Research

arXiv:2607.15079v2 Announce Type: replace Abstract: Understanding the brain increasingly depends on integrating evidence across scales, modalities, and disciplines. Addressing a single research question therefore requires a coordinated sequence of operations, from surveying prior work to executing a

arxiv1d ago

CTC: The Composite Task Challenge for Cooperative Multi-Agent Reinforcement Learning

arXiv:2502.00345v2 Announce Type: replace-cross Abstract: The critical role of division of labor (DOL) in enhancing cooperation is well-recognized in real-world applications. Consequently, many cooperative multi-agent reinforcement learning (MARL) methods have incorporated DOL mechanisms to improve

arxiv1d ago

memorywire: A Vendor-Neutral Wire Format for Agent Memory Operations

arXiv:2606.01138v3 Announce Type: replace-cross Abstract: Agent-memory frameworks -- mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor -- each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration reb

arxiv1d ago

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

arXiv:2606.02240v3 Announce Type: replace-cross Abstract: Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls) whose response content the user neither writ

arxiv1d ago

HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization

arXiv:2606.26614v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents enable natural language interaction for scientific visualization (SciVis). Still, prior systems have essentially prioritized autonomy over human analytical control, thereby limiting transparency and human ove

arxiv1d ago

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

arXiv:2606.29538v4 Announce Type: replace-cross Abstract: Skills are a useful abstraction for software agents, turning human and agent experience into reusable procedural knowledge. Yet existing skill libraries are mostly hand-written, text-centric, or derived from agent traces, leaving tutorial vid

arxiv1d ago

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

arXiv:2607.14186v2 Announce Type: replace-cross Abstract: Synthesizing training data to scale agent capabilities in LLM post-training is bottlenecked by substrate-bound task synthesis: tasks are generated from fixed tools, repositories, or skill graphs, so expanding coverage requires manual substrat

arxiv1d ago

What Should a Skill Remember? Quality--Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents

arXiv:2606.09421v3 Announce Type: replace Abstract: Large language model agents increasingly rely on skills: reusable procedural documents encoding workflows, tool use, implementation patterns, validation checks, and domain rules. Skill rewriting is often treated as prompt compression, but shorter s

arxiv1d ago

Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing

arXiv:2606.07636v2 Announce Type: replace-cross Abstract: Long-form video editing over heterogeneous footage requires agents to coordinate source selection, multimodal analysis, timeline construction, narration and subtitle alignment, rendering, and revision while exposing intermediate state for ins

arxiv1d ago

The Honest Quorum Problem: Epistemic Byzantine Fault Tolerance for Agentic Infrastructure

arXiv:2607.16109v1 Announce Type: new Abstract: State machine replication (SMR) and Byzantine fault-tolerant (BFT) consensus guarantee agreement despite a bounded number of arbitrary, colluding faulty participants. However, these guarantees rely on participants outside this set correctly executing t

arxiv1d ago

Cura 1T: Specialized Model for Agentic Healthcare

arXiv:2607.15314v1 Announce Type: new Abstract: Healthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs that cover these use cases together remain limited. A healthcare model must handle patient consultation, clinical reasoning over text and images,

arxiv1d ago

Do Coding Agents Need Executable World Models, Simplification, and Verification to Solve ARC-AGI-3?

arXiv:2607.15439v1 Announce Type: new Abstract: Our previous ARC-AGI-3 agent bundled executable world modeling, scheduled simplification, and exact replay verification, leaving unclear which idea accounted for its performance. We address this attribution question with four nested Codex-based agents:

arxiv1d ago

LLM-Powered Agentic AI for 5G/6G Networks: A Tutorial and Survey on Architectures, Protocols, and Standardization

arXiv:2607.16066v1 Announce Type: cross Abstract: Agentic Artificial Intelligence (AI), enabled by Large Language Models, marks a shift from rule-based automation toward autonomous, goal-driven control of Next-Generation Networks (NGNs). Existing surveys treat the two domains in isolation, leaving p

arxiv1d ago

When Does Muon Help Agentic Reinforcement Learning?

arXiv:2607.16169v1 Announce Type: cross Abstract: Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post-training remains unclear. We study vanilla Muon in sparse-reward agentic RL through matched single-seed comparisons with AdamW on ALFWorld

arxiv3d ago

SAGA: Schema-Aware Grounding for Agentic Text-to-SPARQL Generation

arXiv:2607.14494v1 Announce Type: new Abstract: Complex knowledge base question answering (KBQA) is commonly approached through either information retrieval over a question-specific subgraph or semantic parsing into an executable logical form. We study the latter paradigm. Recent large language mode

arxiv3d ago

ReasFlow: Assisting Reasoning-Centric Scientific Discovery in Applied Mathematics via a Knowledge-Based Multi-Agent System

arXiv:2607.14178v1 Announce Type: new Abstract: Recent advances in Large Language Models have fueled autonomous AI agents capable of tackling complex scientific tasks, yet existing automated research systems remain predominantly focused on empirically driven domains with quantitative benchmarks, lea

arxiv3d ago

OmniaBench: Benchmarking General AI Agents Across Diverse Scenarios

arXiv:2607.14989v1 Announce Type: cross Abstract: Large language models are increasingly evolving from text generators into general agents capable of understanding user requests, invoking external tools, and completing complex tasks through interaction. However, existing agent benchmarks often focus

arxiv3d ago

Why Git Is the Memory Solution for the Agentic Development Lifecycle

arXiv:2607.14390v1 Announce Type: cross Abstract: Coding agents now produce a growing share of a team's code, while the reasoning behind each change -- the alternatives weighed, the constraints discovered, the approaches rejected -- is trapped in assistant transcripts that vanish with the session. M

arxiv3d ago

ANet Patu-1: The Value of Connection in the Agent Network

arXiv:2607.15053v1 Announce Type: cross Abstract: The Internet taught us that the value of a network depends on \emph{how} its nodes connect: broadcast stars scale as $V\!\propto\!N$ (Sarnoff), fully-connected meshes as $N^2$ (Metcalfe), and group-forming networks as $2^{N}$ (Reed). We ask the analo

arxiv3d ago

MAPS: Modeling Co-Existing Subjective Perspectives and Shared Meaning in Multi-Agent Cognitive Dialogue

arXiv:2607.14110v1 Announce Type: cross Abstract: Human dialogue involves more than exchanging information; it also expresses beliefs, emotions, and subjective cognitive styles. Yet current AI dialogue systems often enforce semantic uniformity, sacrificing diversity and interpretability. We present

arxiv3d ago

Automatic Hard Example Synthesis with Multi-Level Agentic Data Curation

arXiv:2607.14256v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) are increasingly deployed for nuanced content safety and moderation tasks, yet they remain vulnerable to adversarial attacks and out-of-distribution edge cases. Traditional active learning and manual annotation

Agent-ModernColBERT news

49 articles mentioning Agent-ModernColBERT

arxiv1d ago

ReasFlow: Assisting Reasoning-Centric Scientific Discovery in Applied Mathematics via a Knowledge-Based Multi-Agent System

arxiv3d ago