arxiv19h ago
arXiv:2606.05698v1 Announce Type: new Abstract: Parametric retrieval augmentation encodes document information into lightweight, document-specific modules such as LoRA adapters, reducing the need to include all evidence as input context. However, it remains unclear how this parameter-side memory int
arxiv19h ago
arXiv:2606.04171v1 Announce Type: cross Abstract: File-type classification underlies many workflows like malware triage, forensic carving, packet inspection, and storage indexing. Learned systems such as Google's Magika assume whole-file access at a known offset, so they break on the inputs many of
arxiv2d ago
arXiv:2511.17126v4 Announce Type: replace-cross Abstract: Emerging deep-learning-based lens library pre-training (LensLib-PT) pipeline offers a new avenue for blind lens aberration correction by training a universal neural network, demonstrating strong capability in handling diverse unknown optical
arxiv2d ago
arXiv:2606.03284v1 Announce Type: new Abstract: Frontier LLMs perform well in Western contexts, but remain poorly tested on underrepresented cultures such as those in Southeast Asia (SEA). Existing NLI benchmarks are largely Western-centric, translation-derived, or monolingual, limiting their abilit
arxiv2d ago
arXiv:2605.12925v3 Announce Type: replace-cross Abstract: Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that t
arxiv2d ago
arXiv:2606.03728v1 Announce Type: new Abstract: Retrieval-augmented generation systems for legal question answering typically retrieve passages based on semantic similarity and provide them to a language model, which then generates cited answers. Prior work assumes that highly ranked passages are mo
arxiv2d ago
arXiv:2606.03279v1 Announce Type: new Abstract: In AI for Science, physics-informed losses are increasingly used to train learned compressors for scientific data, but their rate-distortion implications remain poorly understood. At fixed bitrate, these objectives often improve preservation of a targe
arxiv3d ago
arXiv:2411.05196v3 Announce Type: replace Abstract: This study presents DhondtXAI as a SHAP-independent, D'Hondt-based attribution framework for tabular XAI. Instead of model-native feature importance or SHAP values, DhondtXAI computes background-interventional removal effects, separates positive an
arxiv3d ago
arXiv:2606.00366v1 Announce Type: new Abstract: We consider the problem of generating a large collection of initial guesses for local minima of multimodal non-convex continuous optimization problems. The goal is for these initial guesses to be high-quality (i.e., a numerical solver converges quickly
arxiv3d ago
arXiv:2606.01671v1 Announce Type: new Abstract: In the contemporary epoch of multilingual education, learning idioms provides a fascinating gateway towards creativity, cultural values, historical context, and diverse perspectives inherent to various linguistic traditions. This paper showcases the na
arxiv3d ago
arXiv:2606.00123v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have shown strong performance on public medical benchmarks, yet existing evaluations often remain weak proxies for clinical use, relying on isolated inputs and simplified recognition-style tasks. We introduce
arxiv3d ago
arXiv:2505.24069v4 Announce Type: replace-cross Abstract: Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic benchmark for evaluating
arxiv3d ago
arXiv:2606.01033v1 Announce Type: new Abstract: When a language model hallucinates, the final answer is wrong, but the mistake is not necessarily invisible inside the model. Different internal pathways may remain uncertain, disagree in how quickly they sharpen, or commit to competing continuations b
arxiv3d ago
arXiv:2606.00288v1 Announce Type: new Abstract: Large language models are undergoing a transition from model technology to system technology. As developers use Codex, Claude Code, AutoGPT, and related agents to write code, manage projects, and execute multi-step tasks, recurring engineering problems
arxiv3d ago
arXiv:2606.00084v1 Announce Type: cross Abstract: Online travel platforms generate vast volumes of user-generated hotel reviews, offering rich opportunities to understand traveler experiences at scale. However, transforming unstructured textual feedback into structured, actionable insights remains a
arxiv4d ago
arXiv:2605.30960v1 Announce Type: new Abstract: Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimat
arxiv4d ago
arXiv:2605.31581v1 Announce Type: new Abstract: The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We introduce context-dependent argumentation frameworks (CDA
arxivMay 29
arXiv:2605.29007v1 Announce Type: new Abstract: Personalized tutoring, teacher training, and education research need access to \emph{targeted} synthetic misconceptions, but privacy and IRB constraints make labelled corpora of real student errors scarce. LLMs could in principle generate synthetic err
arxivMay 29
arXiv:2605.30330v1 Announce Type: new Abstract: Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement model at in
arxivMay 28
arXiv:2603.21165v2 Announce Type: replace Abstract: Bangla culture is richly expressed through region, dialect, history, food, politics, media, and everyday visual life, yet it remains underrepresented in multimodal evaluation. To address this gap, we introduce BanglaVerse, a culturally grounded ben
arxivMay 28
arXiv:2605.28073v1 Announce Type: cross Abstract: Story rewriting aims to adapt existing narratives to diverse reader preferences while preserving plot consistency and narrative coherence. Unlike conventional work on style transfer, we argue that effective story rewriting demands context-aware narra
arxivMay 28
arXiv:2506.08311v2 Announce Type: replace-cross Abstract: Automated Program Repair (APR) agents leverage Large Language Models (LLMs) to autonomously diagnose and fix software bugs through reasoning, planning, and tool use. Despite impressive leaderboard gains on benchmarks such as SWE-bench, little
arxivMay 27
arXiv:2605.26442v1 Announce Type: cross Abstract: Much of the alignment tuning literature is organized around optimization objectives, while the construction of alignment data is often treated implicitly. In this survey, we adopt a data centric perspective and reframe alignment tuning as a pipeline
arxivMay 26
arXiv:2605.25312v1 Announce Type: new Abstract: We introduce P1SCO, a dataset of social media comments collected from three distinct platforms, annotated according to ten social dimensions to capture the diversity of social interactions and perceptions. The dataset is carefully disaggregated to allo
arxivMay 26
arXiv:2605.24545v1 Announce Type: cross Abstract: Federated learning (FL) increasingly needs machine unlearning to comply with privacy regulations. However, existing federated unlearning approaches may overlook the overlapping information between the unlearning and remaining data, leading to ineffec
arxivMay 25
arXiv:2506.04390v2 Announce Type: replace-cross Abstract: Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved context, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable de
arxivMay 25
arXiv:2602.12316v2 Announce Type: replace Abstract: Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such as coordination failure and conflict poorly und
arxivMay 22
arXiv:2502.21194v3 Announce Type: replace-cross Abstract: We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the w
arxivMay 21
arXiv:2605.19578v1 Announce Type: cross Abstract: RGB camera-based surveillance systems enable human action recognition for public safety and healthcare, yet raise serious privacy concerns. Existing methods rely on post-capture algorithms, which fail to protect privacy during data acquisition. We pr
arxivMay 21
arXiv:2605.20191v1 Announce Type: new Abstract: Modern Large Language Models (LLMs) have recently attracted much attention for their ability to simulate human behavior and generate text that reflects personas and demographic groups. While these capabilities can open up a multitude of diverse applica
arxivMay 19
arXiv:2605.17364v1 Announce Type: new Abstract: Media bias detection has predominantly been framed as a classification task: assign a political label to an article or outlet. We argue this framing is too shallow: it identifies that bias exists but not where, how, or crucially, what is structurally o
arxivMay 19
arXiv:2603.03328v2 Announce Type: replace-cross Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest their own internal structures as well. While interpretability rese
arxivMay 19
arXiv:2605.17610v1 Announce Type: cross Abstract: The rapid growth of online video platforms and AI-generated content has made reliable video guardrails a key challenge for safety and real-world deployment. While most videos can be screened through fast pattern recognition, a small subset requires d
arxivMay 19
arXiv:2512.11089v4 Announce Type: replace-cross Abstract: We introduce test prediction variance (TPV)--the first-order sensitivity of a trained model's outputs to parameter perturbations--as a unifying framework for analyzing post-training robustness. TPV is a fully label-free object whose trace for
arxivMay 18
arXiv:2512.09673v3 Announce Type: replace-cross Abstract: Equivariant neural networks encode the intrinsic symmetry of data as an inductive bias, which has achieved impressive performance in wide domains. However, the understanding to their expressive power remains premature. Focusing on 2-layer ReL
arxivMay 16
arXiv:2507.15774v3 Announce Type: replace-cross Abstract: While deep learning is facing an homogenization across modalities led by Transformers, they are still challenged by shallow linear models in the time series forecasting task. Our hypothesis is that models should learn a direct link from past
arxivMay 15
arXiv:2605.14249v1 Announce Type: new Abstract: We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet exist
arxivMay 14
arXiv:2605.12798v1 Announce Type: cross Abstract: Fine-tuning LLMs on narrow harmful datasets can induce Emergent Misalignment (EM), where models exhibit misaligned behavior far beyond the fine-tuning distribution. We argue that emergent misalignment can be better understood as a data-mediated trans
arxivMay 14
arXiv:2605.10556v2 Announce Type: replace-cross Abstract: As large language models span dense, mixture-of-experts, and state-space architectures and are deployed on heterogeneous accelerators under increasingly diverse multimodal workloads, optimising inference energy has become as critical as optim
arxivMay 13
arXiv:2605.09159v1 Announce Type: new Abstract: Recent work shows that large language models (LLMs) encode behavioural traits ("personas") as linear directions in activation space, often called "persona vectors". Prior work has used such directions as static handles for behavioural steering. Buildin
arxivMay 13
arXiv:2605.08386v1 Announce Type: new Abstract: Skill libraries have become a practical way for LLM agents to reuse procedural experience across tasks. However, existing systems typically treat skills as flat, single-resolution prompt blocks. This creates a tension between relevance and cost: inject
arxivMay 13
arXiv:2512.23025v2 Announce Type: replace-cross Abstract: Multimodal health sensing offers rich behavioral signals for assessing mental health, yet translating these numerical time-series measurements into natural language remains challenging. Current LLMs cannot natively ingest long-duration sensor
arxivMay 13
arXiv:2504.10766v2 Announce Type: replace-cross Abstract: As the post-training of large language models (LLMs) advances from instruction-following to complex reasoning tasks, understanding how different data affect finetuning dynamics remains largely unexplored. In this paper, we present a spectral
arxivMay 13
arXiv:2605.12258v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress, yet the object hallucination remains a critical challenge for reliable deployment. In this paper, we present an in-depth analysis of instruction token embeddings and reveal tha
arxivMay 13
arXiv:2502.11981v3 Announce Type: replace Abstract: Decades of research in machine learning have given us powerful tools for making accurate predictions. But when used in social settings and on human inputs, better accuracy does not immediately translate to better social outcomes. To effectively pro
arxivMay 13
arXiv:2605.06218v3 Announce Type: replace Abstract: Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, a
arxivMay 13
arXiv:2605.12134v1 Announce Type: cross Abstract: Recent text-to-image models produce high-quality images, yet text ambiguity hinders precise control when specific styles or objects are required. There have been a number of recent works dealing with learning and composing multiple objects and patter
arxivMay 12
arXiv:2605.09443v1 Announce Type: cross Abstract: The advancement of Multimodal Large Language Models (MLLMs) has expanded Role-Playing Agents (RPAs) into visually grounded environments. However, human vision is inherently subjective and identity-driven, whereas existing MLLMs extract objective, cha