arxiv1d ago
arXiv:2601.14295v4 Announce Type: replace Abstract: Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues f
arxiv3d ago
arXiv:2606.10327v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate task-aware fine-tuning of L
arxiv3d ago
arXiv:2510.04195v2 Announce Type: replace Abstract: Given a map description through global traversal navigation instructions, an LLM can often infer the implicit spatial layout and answer user queries by providing shortest paths. However, such context-dependent querying becomes incapable as environm
huggingface3d ago
arxivJun 4
arXiv:2606.03110v2 Announce Type: replace Abstract: Aligning AI systems with diverse human values requires value specifications grounded in concrete examples, but generating such examples without extensive human supervision remains an open challenge. We investigate what makes these examples effectiv
arxivJun 3
arXiv:2606.02841v1 Announce Type: new Abstract: Deep neural networks learn representations where individual features often lack interpretable meaning; a single neuron may activate for scattered, unrelated inputs. We introduce coherence, a geometric property inspired by neural coding in the brain, wh
arxivJun 3
arXiv:2606.02655v1 Announce Type: cross Abstract: External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actua
arxivJun 2
arXiv:2606.00950v1 Announce Type: new Abstract: Unsupervised skill discovery (USD) aims to learn diverse behaviors without reward functions, but often results in task-irrelevant or hazardous behaviors due to uniform exploration. Guided skill discovery (GSD) addresses this issue by incorporating huma
arxivJun 2
arXiv:2606.02194v1 Announce Type: new Abstract: Distilling expert demonstration data into large generative models using behavioral cloning is a scalable approach to learning capable policies for robotic control, particularly for dexterous manipulation. Reinforcement learning (RL) can be used as a me
arxivJun 1
arXiv:2605.30668v1 Announce Type: cross Abstract: Dialogue topic segmentation is critical in many human-AI collaborative applications which requires identifying heterogeneous boundary cues, including lexical transitions near utterance edges and semantic discontinuities across utterances. Existing ut
arxivMay 29
arXiv:2601.13111v2 Announce Type: replace-cross Abstract: Realistic text-to-SQL workflows often require joining multiple tables. As a result, accurately retrieving the relevant set of tables becomes a key bottleneck for end-to-end performance. We study an open-book setting where queries must be answ
arxivMay 29
arXiv:2605.28832v1 Announce Type: cross Abstract: Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word co-occurrence patterns, with Latent Dirichlet Allocation (LDA) remaining one of the most widely us
arxivMay 29
arXiv:2605.30335v1 Announce Type: new Abstract: Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent,
arxivMay 29
arXiv:2605.14373v3 Announce Type: replace-cross Abstract: Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they are either sample-i
arxivMay 28
arXiv:2603.24631v2 Announce Type: replace-cross Abstract: Code agents resolve 65-70% of SWE-bench Verified issues, but Pass@1 cannot tell us why the rest fail, and, as we show, capable-model failures are systematically misdiagnosed without trajectory data. We introduce TRAJEVAL, a training-free deco
arxivMay 28
arXiv:2605.27971v1 Announce Type: cross Abstract: When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objective, which under share
arxivMay 22
arXiv:2605.21731v1 Announce Type: new Abstract: Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features
arxivMay 21
arXiv:2605.15944v2 Announce Type: replace-cross Abstract: Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing appr
arxivMay 19
arXiv:2603.20216v2 Announce Type: replace-cross Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Ach
arxivMay 19
arXiv:2603.01092v2 Announce Type: replace Abstract: Scientific discovery is constrained not only by what is true, but by what is cognitively available to the researchers currently exploring a field. Many directions are coherent in light of the literature yet unlikely to be proposed because no existi
arxivMay 19
arXiv:2605.07263v2 Announce Type: replace-cross Abstract: Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultan
arxivMay 18
arXiv:2605.00934v2 Announce Type: replace Abstract: Coherent Point Drift (CPD) is a representative probabilistic framework for unsupervised non-rigid point set registration. Its standard non-rigid M-step, however, relies on a point-indexed Gaussian-kernel system whose size grows with the number of m
arxivMay 16
arXiv:2605.14534v1 Announce Type: cross Abstract: Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no
arxivMay 14
arXiv:2604.27389v2 Announce Type: replace-cross Abstract: In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image comprehension.
arxivMay 13
arXiv:2509.02510v2 Announce Type: replace-cross Abstract: Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coh
arxivMay 12
arXiv:2605.10462v1 Announce Type: new Abstract: Formalisation is the process of writing system requirements in a formal language. These requirements mostly originate in Natural Language. In the field of Formal Methods, formalisation is often identified as one of the most delicate and complicated ste
arxivMay 12
arXiv:2605.10123v1 Announce Type: new Abstract: Complex-valued Transformers have largely inherited softmax attention from real-valued architectures. However, row-normalised token competition is not necessarily aligned with phase-preserving computation. In this paper, we introduce the Phase-Coherent
arxivMay 12
arXiv:2512.21587v2 Announce Type: replace-cross Abstract: Spatial photonic Ising machines offer a novel optical platform for optimization and spin-model simulation, but existing diffraction-based schemes rely on auxiliary spins or multiplexing to encode high-rank couplings and external fields, reduc
arxivMay 12
arXiv:2602.01015v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly embedded in AI-based tutoring systems. Can they faithfully model novice reasoning and metacognitive judgments? Existing evaluations emphasize problem-solving accuracy, overlooking the fragmented and imp
arxivMay 11
arXiv:2509.03736v2 Announce Type: replace Abstract: The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether
arxivMay 8
arXiv:2411.19182v2 Announce Type: replace-cross Abstract: Originating from the diffusion phenomenon in physics, which describes the random movement and collisions of particles, diffusion generative models simulate a random walk in the data space along the denoising trajectory. This allows informatio
arxivMay 6
arXiv:2605.00929v1 Announce Type: cross Abstract: Multivariate time series anomaly detection in ICS has attracted growing attention due to the increasing threat of cyber-physical attacks on critical infrastructure. State-of-the-art methods model inter-sensor relationships from raw time-domain amplit
arxivMay 6
arXiv:2605.02734v1 Announce Type: new Abstract: Learning to Defer (L2D) enables a model to predict autonomously or defer to an expert, but prior work largely assumes flat label spaces. We study the first L2D setting with hierarchical multi-label decisions, motivated by medical-imaging workflows in w
arxivMay 5
arXiv:2605.00960v1 Announce Type: cross Abstract: We introduce energy-based constraint networks -- a modality-agnostic architecture that learns structural coherence from contrastive pairs. The system processes frozen encoder embeddings through a state-space model with dual-head attention, producing
arxivMay 1
arXiv:2602.21361v3 Announce Type: replace-cross Abstract: Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throughput and increasing dose. Extending coherent diffractive imaging to overlap-free operation on extended samples remains an open problem. Her
arxivApr 30
arXiv:2604.26561v1 Announce Type: cross Abstract: Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectiv
arxivApr 29
arXiv:2604.23325v1 Announce Type: cross Abstract: Emotionally talking head video generation aims to generate expressive portrait videos with accurate lip synchronization and emotional facial expressions. Current methods rely on simple emotional labels, leading to insufficient semantic information. W
arxivApr 29
arXiv:2604.25482v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown strong potential for narrative generation, but their use in complex, multi-layered role-playing game (RPG) worlds is still limited by issues of coherence, controllability, and structural consistency. This paper e
arxivApr 29
arXiv:2604.25489v1 Announce Type: cross Abstract: Coherent transition radiation (CTR) spectroscopy is a critical diagnostic for characterizing the longitudinal structure of relativistic electron bunches in laser-plasma and conventional accelerators. In practice, recovering the bunch profile from a m
arxivApr 28
arXiv:2604.22981v1 Announce Type: new Abstract: Reward models in RLHF are trained to score only the final token of a response - a choice that discards rich signal from every intermediate position and produces models whose token-level outputs are noise. We argue this is a missed opportunity: a well-t
arxivApr 27
arXiv:2604.22139v1 Announce Type: cross Abstract: Reliable automated analysis of Optical Coherence Tomography (OCT) imaging is crucial for diagnosing retinal disorders but faces a critical barrier: the need for expensive, labor-intensive expert annotations. Supervised deep learning models struggle t
arxivApr 27
arXiv:2604.22560v1 Announce Type: cross Abstract: Graph Visual Question Answering (GVQA) for autonomous driving organizes reasoning into ordered stages, namely Perception, Prediction, and Planning, where planning decisions should remain consistent with the model's own perception. We present a compar
techcrunchApr 25
Canadian AI startup Cohere is taking over Germany-based Aleph Alpha with support from Schwarz Group. With the blessing of their governments, the companies intend to offer a sovereign alternative to enterprises in an AI landscape dominated by American players.