arxiv4d ago

Incomplete Prompt Jailbreaks in Large Language Models

arXiv:2607.20473v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly released as open-weight models with safeguards against harmful requests. Nevertheless, sentence completion remains vulnerable to incomplete harmful prompts. In this work, we formalize this phenomenon as inc

#safety #open-source #language-models Read on arxiv →

arxiv4d ago

CultureTalk-ID: A Multi-Task Dialogue Benchmark for Cultural Commonsense in Indonesian Local Languages

arXiv:2607.21016v1 Announce Type: new Abstract: Culture is lived through conversation, yet existing Indonesian cultural commonsense benchmarks evaluate LLMs on short and isolated prompts, stripping away the dialogic context in which cultural nuances actually surface. We introduce CultureTalk-ID, the

LL1 model #benchmark #cultural-commonsense #language-models Read on arxiv →

arxivJul 21bullish

TRACE: Trajectory-Based Safety Patch Learning for LLM Post-Training Realignment

arXiv:2607.16242v1 Announce Type: cross Abstract: Fine-Tuning-as-a-Service (FTaaS) platforms let users train large language models (LLMs) on customized tasks, but this pipeline could erode models' safety alignment. In practice, service providers need to recover models' safety without re-running full

#safety #fine-tuning #language-models Read on arxiv →

arxivJul 21bullish

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

arXiv:2602.11961v3 Announce Type: replace Abstract: Open large language models (LLMs) have demonstrated improving multilingual capabilities in recent years. In this paper, we present a study of open LLMs for multilingual machine translation (MT) across a range of languages, and investigate the effec

MISEHY7 models · +4 #multilingual #machine-translation #open-source Read on arxiv →

arxivJul 16

The Illusion of Robustness: Aggregate Accuracy Hides Prediction Flips under Task-Irrelevant Context

arXiv:2607.12963v2 Announce Type: new Abstract: As large language models (LLMs) grow more capable, they are increasingly deployed in context-rich settings where task inputs are often accompanied by long, partially irrelevant context. In a controlled setting, we find that state-of-the-art models ofte

#language-models #reliability #evaluation Read on arxiv →

arxivJul 16

A Shared Subcircuit Lets LLMs Count Down Across Tasks

arXiv:2607.12279v1 Announce Type: new Abstract: Writing a sentence of exactly twelve words; ending a DNA sequence at the right codon; formatting an ASCII table. These are all tasks that language models can do that requires tracking how many tokens remain before a target. In this work, we identify in

ME1 model #language-models #research #neural-networks Read on arxiv →

arxivJul 15

Toward Localizing and Repairing Bias in Transformer Attention Heads

arXiv:2607.12863v1 Announce Type: cross Abstract: Transformer language models are increasingly used as software components, yet biased outputs remain difficult to localize and repair inside the model. Existing fairness testing and repair methods largely operate at the input-output or retraining leve

#fairness #bias #debugging Read on arxiv →

arxivJul 14bullish

Index SLM Technical Report

arXiv:2607.09885v1 Announce Type: new Abstract: We present Index-1.9B, a series of open small language models developed at Bilibili. The series comprises four models: Index-1.9B-Base, a foundation model with 1.9 billion non-embedding parameters pre-trained on 2.8 trillion predominantly Chinese and E

INININ4 models · +1 #open-source #language-models #pre-training Read on arxiv →

arxivJul 14bullish

PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs

arXiv:2505.18610v2 Announce Type: replace Abstract: Recently, significant progress has been made in developing reasoning-capable Large Language Models (LLMs) through long Chain-of-Thought (CoT) techniques. However, this long-CoT reasoning process imposes substantial memory overhead due to the large

LA1 model #quantization #compression #language-models Read on arxiv →

arxivJul 11

Where do LLMs Fall Short in CBT-Guided Affective Reasoning?

arXiv:2607.02885v2 Announce Type: replace Abstract: Cognitive Behavioral Therapy (CBT) provides a structured framework for understanding a user's mental state by examining the interaction between cognitive and behavioral factors. However, out-of-the-box LLMs respond fluently and empathetically, yet

LL1 model #affective-computing #cognitive-behavioral-therapy #language-models Read on arxiv →

arxivJul 3bullish

DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents

arXiv:2607.01557v1 Announce Type: cross Abstract: Large Language Models (LLMs) often struggle with persuasion in high-stakes scenarios. People's individual personalities and concerns require tailored strategies rather than a one-size-fits-all approach. To address this challenge, we focus on a fire-r

RALL2 models #persuasion #language-models #dialogue-systems Read on arxiv →

arxivJul 3bullish

Evergreen: Efficient Claim Verification for Semantic Aggregates

arXiv:2604.26180v2 Announce Type: replace-cross Abstract: With recent semantic query processing engines, semantic aggregation has become a primitive operator, enabling the reduction of a relation into a natural language aggregate using an LLM. However, the resulting semantic aggregate may contain cl

LL1 model #databases #optimization #verification Read on arxiv →

arxivJul 3

Parameter Golf: What Really Works?

arXiv:2607.01517v1 Announce Type: new Abstract: How far can a language model improve under a strict artifact budget? Parameter Golf posed this question as an open community challenge in which participants trained the best language model, with the complete artifact (training code + compressed weights

#optimization #language-models #benchmark Read on arxiv →

arxivJul 2

NeuroFilter: Activation-Based Guardrails for Privacy-Conscious LLM Agents

arXiv:2601.14660v2 Announce Type: replace-cross Abstract: Agentic Large Language Models (LLMs) are models able to reason, plan, and execute tools over unstructured data. These abilities are enabling transformative applications in domains spanning from personal assistant, financial, and legal domains

AG1 model #privacy #security #language-models Read on arxiv →

arxivJul 2bullish

Efficient Multilingual Reasoning Transfer via Progressive Code-Switching

arXiv:2607.00485v1 Announce Type: new Abstract: Large reasoning models (LRMs) have achieved strong reasoning capabilities in English, yet their performance degrades significantly when required to reason in other languages. A natural solution is to transfer the model's English reasoning ability to ta

#language-models #transfer-learning #reasoning Read on arxiv →

arxivJul 2bullish

AGE: Adaptive-masking for Graph Embedding in Graph Retrieval-Augmented Generation

arXiv:2607.00052v1 Announce Type: cross Abstract: GraphRAG is an extension of retrieval-augmented generation (RAG) that supports large language models (LLMs) by referring to graph-structured data as external knowledge. While this technique ideally captures intricate relationships, it often struggles

GRTR2 models #graph-structured #language-models #self-supervised-learning Read on arxiv →

arxivJul 1bullish

Embodied CAD: Solver-Grounded LLM Agents for Parametric B-Rep Assembly Modeling

arXiv:2606.31252v1 Announce Type: new Abstract: Large language models can write plausible CAD scripts, but reliable industrial CAD modeling requires more than syntactically valid code: every feature, placement, and assembly relation must be accepted by an exact geometric kernel while remaining edita

EM1 model #cad #parametric-modeling #assembly-modeling Read on arxiv →

arxivJun 30

Agentic Tool Use in Large Language Models

arXiv:2604.00835v2 Announce Type: replace Abstract: Large language models are increasingly being deployed as autonomous agents yet their real world effectiveness depends on reliable tools for information retrieval, computation and external action. Existing studies remain fragmented across tasks, too

#language-models #information-retrieval #computation Read on arxiv →

arxivJun 30

Situation Perception: A Necessary Primitive to Artificial Superintelligence

arXiv:2606.30481v1 Announce Type: cross Abstract: Current large language models are extraordinary statistical engines. They compress vast amounts of text into useful patterns and can explain science, write code, imitate reasoning, and participate in philosophical conversation. Yet pattern mastery is

#artificial-intelligence #language-models #superintelligence Read on arxiv →

arxivJun 29

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

arXiv:2606.27709v1 Announce Type: cross Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens adversarial safety, makin

#safety #fine-tuning #language-models Read on arxiv →

arxivJun 29

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges-

MU1 model #multilingual #pretraining #language-models Read on arxiv →

arxivJun 27bullish

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextForge, a system for co

#language-models #conversational-ai #efficiency Read on arxiv →

arxivJun 27bullish

Language-Based Digital Twins for Elderly Cognitive Assistance

arXiv:2606.27334v1 Announce Type: new Abstract: Digital twins have emerged as a promising paradigm for personalized healthcare, enabling modeling of individual behavior and health trajectories. In cognitive health, early detection of Mild Cognitive Impairment (MCI) remains challenging, where languag

GPCV2 models #healthcare #digital-twins #language-models Read on arxiv →

arxivJun 26

When are likely answers right? On Sequence Probability and Correctness in LLMs

arXiv:2606.27359v1 Announce Type: cross Abstract: Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, their success depends

#language-models #decoding-methods #machine-learning Read on arxiv →

arxivJun 25bullish

Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs

arXiv:2511.20892v4 Announce Type: replace Abstract: Large language models (LLMs) often produce incorrect or outdated content after being employed. Efficient and accurate knowledge updates without costly retraining are a major challenge. This problem is particularly challenging in lifelong settings,

MEQW2 models #lifelong-learning #knowledge-control #language-models Read on arxiv →

arxivJun 25bullish

ExTra: Exploratory Trajectory Optimization for Language Model Reinforcement Learning

arXiv:2606.24994v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) for language-model reasoning can fail at both extremes of task difficulty: easy prompts often produce all-correct, low-diversity rollout groups with little gradient signal, while hard prompts can pr

QW1 model #reinforcement-learning #language-models #exploration Read on arxiv →

arxivJun 25

Graph-Based Phonetic Error Correction of Noisy ASR

arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors a

G-GRMA4 models · +1 #asr #speech-recognition #nlp Read on arxiv →

arxivJun 25

Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs

arXiv:2511.05933v2 Announce Type: replace Abstract: Reinforcement learning (RL) is often credited with improving language model reasoning at the expense of knowledge. We challenge this narrative by showing that reasoning models consistently outperform their instruction-tuned versions on pure knowled

#reinforcement-learning #language-models #knowledge-recall Read on arxiv →

arxivJun 25

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

arXiv:2505.12843v2 Announce Type: replace Abstract: Reinforcement Learning from Human Feedback (RLHF) relies on reward models to align large language models with human preferences. However, RLHF often suffers from reward hacking, wherein policy learning exploits flaws in the trained reward model to

FIDIBE3 models #reinforcement-learning #bias-mitigation #language-models Read on arxiv →

arxivJun 19bullish

VIMPO: Value-Implicit Policy Optimization for LLMs

arXiv:2606.20008v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO av

VIGRPP3 models #reinforcement-learning #language-models #optimization Read on arxiv →

arxivJun 18

Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

arXiv:2606.19266v1 Announce Type: cross Abstract: The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adapta

#domain-adaptation #language-models #medical-qa Read on arxiv →

arxivJun 18

TW-LegalBench: Measuring Taiwanese Legal Understanding

arXiv:2606.18699v1 Announce Type: cross Abstract: Large language models (LLMs) have shown impressive capabilities across diverse tasks, yet their performance on jurisdiction-specific legal reasoning remains underexplored. We present TW-LegalBench that utilizes Taiwanese legal system's rich official

LA1 model #legal-reasoning #language-models #evaluation Read on arxiv →

arxivJun 17

Combating Data Laundering in LLM Training

arXiv:2604.01904v3 Announce Type: replace-cross Abstract: Post-hoc unauthorized-training data detection for large language models (LLMs) typically assumes a query-with-originals regime: rights holders query a target LLM with raw proprietary data and assess whether the model assigns them stronger mem

MEPYFA3 models #data-laundering #detection #security Read on arxiv →

arxivJun 17bearish

In-Context Environments Induce Evaluation-Awareness in Language Models

arXiv:2603.03824v2 Announce Type: replace Abstract: Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategic

CLGPME3 models #adversarial-attacks #evaluation-awareness #sandbagging Read on arxiv →

arxivJun 17bullish

LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline

arXiv:2606.17507v1 Announce Type: new Abstract: Generative AI and large language models (LLMs) are increasingly applied to question generation and automated assessment. However, deploying LLMs in preparation for high-stakes exams requires more than prompt engineering; it demands software pipelines t

LL1 model #education #assessment #language-models Read on arxiv →

arxivJun 17

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

arXiv:2606.17372v1 Announce Type: cross Abstract: Two recent studies (Jones et al. (2026); Zeng et al. (2026)) reach apparently contradictory conclusions about whether LVLMs can coordinate on efficient referring expressions. We control for task differences between the studies while directly comparin

LV1 model #language-models #communication #efficiency Read on arxiv →

arxivJun 15

Multi-component Causal Tracing in Large Language Models

arXiv:2606.03085v2 Announce Type: replace-cross Abstract: Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's b

#machine-learning #research #language-models Read on arxiv →

arxivJun 15bearish

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluat

MU1 model #security #captcha #adversarial-attacks Read on arxiv →

arxivJun 12bullish

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

arXiv:2606.11199v1 Announce Type: cross Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather than targeting benchma

NICLNO3 models #research #competition #language-models Read on arxiv →

arxivJun 12bearish

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

arXiv:2606.13385v1 Announce Type: cross Abstract: Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks

#security #benchmark #vulnerability Read on arxiv →

arxivJun 12bearish

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

arXiv:2606.10931v2 Announce Type: replace Abstract: Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training to ensure fair and reliable behavior. In this work, we investigate how easily such guar

#bias #safety #language-models Read on arxiv →

arxivJun 11bullish

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

arXiv:2606.04694v2 Announce Type: replace Abstract: Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framewor

DU1 model #multilingual #distillation #language-models Read on arxiv →

arxivJun 10bullish

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

arXiv:2606.11119v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, aris

QW1 model #reinforcement-learning #language-models #optimization Read on arxiv →

arxivJun 10bullish

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

arXiv:2606.09866v1 Announce Type: cross Abstract: Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our diagnostics show ta

LL1 model #safety #fine-tuning #language-models Read on arxiv →

arxivJun 10

From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMs

arXiv:2606.10298v1 Announce Type: new Abstract: When large language models generate from retrieved or augmented contexts, conflicts between external context and parametric priors remain a central reliability bottleneck. Existing contrastive decoding methods follow a \emph{context-aware} paradigm tha

#reliability #language-models #evaluation Read on arxiv →

arxivJun 10

Advancing the State-of-the-Art in Empirical Privacy Auditing

arXiv:2606.10481v1 Announce Type: cross Abstract: Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership inference (M

#privacy #language-models #auditing Read on arxiv →

arxivJun 10bullish

RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty

arXiv:2602.12424v2 Announce Type: replace-cross Abstract: Benchmarks establish a standardized evaluation framework to systematically assess the performance of large language models (LLMs), facilitating objective comparisons and driving advancements in the field. However, existing benchmarks fail to

#evaluation #benchmark #language-models Read on arxiv →

arxivJun 6

A Systematic Analysis of Biases in Large Language Models

arXiv:2512.15792v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness across varied contexts is critical to their safe and resp

#fairness #bias #language-models Read on arxiv →

arxivJun 6bullish

FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

arXiv:2606.05644v1 Announce Type: new Abstract: When retrieved evidence contradicts parametric memory, language models frequently ignore context and default to memorized priors -- a failure that undermines the core purpose of retrieval augmentation. Contrastive decoding amplifies the context-conditi

FI1 model #retrieval-augmentation #contrastive-decoding #language-models Read on arxiv →

arxivJun 6bearish

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?

arXiv:2504.10020v4 Announce Type: replace-cross Abstract: Contrastive decoding strategies are widely used to reduce object hallucinations in multimodal large language models (MLLMs). These methods work by constructing contrastive samples to induce hallucinations and then suppressing them in the outp

#multimodal #hallucinations #language-models Read on arxiv →

arxivJun 5bearish

Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation

arXiv:2604.23600v2 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly deployed in persona-driven applications such as education, customer service, and social platforms, where models are prompted to adopt specific personas when interacting with users. While persona conditi

LL1 model #bias #language-models #stereotypes Read on arxiv →

arxivJun 4bullish

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

arXiv:2606.04694v1 Announce Type: new Abstract: Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework th

DU1 model #multilingual #distillation #language-models Read on arxiv →

arxivJun 3bullish

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

arXiv:2606.02684v1 Announce Type: cross Abstract: On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most info

FI1 model #on-policy #distillation #optimization Read on arxiv →

arxivJun 3bullish

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

arXiv:2606.03113v1 Announce Type: new Abstract: Large Language Models suffer from slow autoregressive inference. While self-speculative decoding accelerates this process, its efficiency is hampered by static configurations like fixed exit layers and speculation lengths. We reframe this optimization

MEME2 models #optimization #reinforcement-learning #language-models Read on arxiv →

arxivJun 3bullish

Coherence Maximization Improves Pluralistic Alignment

arXiv:2606.03110v1 Announce Type: new Abstract: Aligning AI systems with diverse human values requires value specifications grounded in concrete examples, but generating such examples without extensive human supervision remains an open challenge. We investigate what makes these examples effective, u

IN1 model #value-alignment #unsupervised-learning #language-models Read on arxiv →

arxivJun 2bullish

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

arXiv:2606.01230v1 Announce Type: new Abstract: Large language model agents are moving beyond text-only interaction toward physical-world control, with smart homes as a representative domain. Real domestic interaction requires understanding ambiguous intents, operating in dynamic environments, and p

HOHOGP3 models #smart-home #language-models #benchmark Read on arxiv →

arxivJun 2bullish

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

arXiv:2605.12969v3 Announce Type: replace-cross Abstract: Group Relative Policy Optimization (GRPO) is one of the most widely adopted RLVR algorithms for post-training large language models on reasoning tasks. We first show that GRPO admits an equivalent discriminative reformulation, in which policy

GRCO2 models #reinforcement-learning #language-models #optimization Read on arxiv →

arxivJun 2bullish

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization

arXiv:2510.05342v2 Announce Type: replace-cross Abstract: Direct Preference Optimization (DPO) has emerged as a simple and effective method for aligning large language models. However, its reliance on a fixed temperature parameter leads to suboptimal training on diverse preference data, causing over

DIIP$\4 models · +1 #machine-learning #optimization #language-models Read on arxiv →

arxivJun 1bullish

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

arXiv:2605.31183v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs

SPLALO3 models #language-models #benchmark #interpretability Read on arxiv →

arxivJun 1

LLM Anonymization Against Agentic Re-Identificatio

arXiv:2605.30848v1 Announce Type: cross Abstract: Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defense

AU1 model #anonymization #privacy #security Read on arxiv →

arxivMay 29bullish

WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models

arXiv:2512.00837v2 Announce Type: replace Abstract: Watermarking acts as a critical safeguard in text generated by Large Language Models (LLMs). By embedding identifiable signals into model outputs, watermarking enables reliable attribution and enhances the security of machine-generated content. Exi

LA1 model #watermarking #language-models #security Read on arxiv →

arxivMay 29bullish

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

arXiv:2605.28919v1 Announce Type: cross Abstract: Large language models have achieved strong reasoning capabilities, though often at the cost of massive parameter counts and expensive inference. In this work, we explore a different direction: adaptive reasoning depth in compact language models. We p

CO1 model #compact-models #reasoning #autoregressive Read on arxiv →

arxivMay 29bullish

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

arXiv:2605.28864v1 Announce Type: new Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matc

COGPGP3 models #language-models #transformers #cognitive-science Read on arxiv →

arxivMay 29

Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation

arXiv:2605.29007v1 Announce Type: new Abstract: Personalized tutoring, teacher training, and education research need access to \emph{targeted} synthetic misconceptions, but privacy and IRB constraints make labelled corpora of real student errors scarce. LLMs could in principle generate synthetic err

LL1 model #education #synthetic-data #language-models Read on arxiv →

arxivMay 29bullish

Unlocking the Working Memory of Large Language Models for Latent Reasoning

arXiv:2605.30343v1 Announce Type: cross Abstract: To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to autoregressive generation and thereby conflates intern

#reasoning #language-models #working-memory Read on arxiv →

arxivMay 29bullish

From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges

arXiv:2601.08654v2 Announce Type: replace-cross Abstract: Rubric-based text evaluation increasingly uses large language models (LLMs) as scalable judges, but aligning frozen black-box models with human scoring standards remains challenging. We formulate this challenge as a criteria-transfer problem:

#evaluation #language-models #rubric-scoring Read on arxiv →

arxivMay 29

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

arXiv:2605.29025v1 Announce Type: new Abstract: Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, where the model's organization of the record shapes what policymakers see and which arguments register. Standard evaluation, anchored on stance accuracy a

#evaluation #interpretability #language-models Read on arxiv →

arxivMay 25

Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement

arXiv:2605.23190v1 Announce Type: new Abstract: Machine-generated texts (MGTs) produced by large language models (LLMs) are increasingly prevalent across various applications, while their potential misuse in fake news propagation and phishing has raised serious concerns, highlighting the need for MG

LA1 model #machine-generated-texts #detection #language-models Read on arxiv →

arxivMay 22bullish

Token-weighted Direct Preference Optimization with Attention

arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existi

LA1 model #optimization #language-models #reinforcement-learning Read on arxiv →

arxivMay 22

Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale

arXiv:2605.20744v1 Announce Type: cross Abstract: Aligning autonomous agents with human intent remains a central challenge in modern AI. A key manifestation of this challenge is reward hacking, whereby agents appear successful under the evaluation signal while violating the intended objective. Rewar

LA1 model #reward-hacking #evaluation #autonomous-agents Read on arxiv →

arxivMay 22bullish

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

arXiv:2605.20199v1 Announce Type: cross Abstract: We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows, FlowLM enables high qual

FL1 model #language-models #diffusion #fine-tuning Read on arxiv →

arxivMay 19bullish

EmoMind: Decoding Affective Captions from Human Brain fMRI

arXiv:2605.16739v1 Announce Type: cross Abstract: Decoding visual experience from brain activity has advanced substantially, but cur- rent brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with cate

EMOP2 models #neuroscience #affective-computing #brain-decoding Read on arxiv →

arxivMay 18

Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis

arXiv:2605.15440v1 Announce Type: new Abstract: Surprisal theory posits that the processing difficulty of a word is determined by its predictability in context, offering a potential link between human sentence processing and next-word predictions from language models. While language model (LM) surpr

RE1 model #language-models #sentence-processing #syntactic-ambiguity Read on arxiv →

arxivMay 16bearish

Quantifying and Mitigating Premature Closure in Frontier LLMs

arXiv:2605.15000v1 Announce Type: cross Abstract: Premature closure, or committing to a conclusion before sufficient information is available, is a recognized contributor to diagnostic error but remains underexamined in large language models (LLMs). We define LLM premature closure as inappropriate c

LL1 model #safety #evaluation #language-models Read on arxiv →

arxivMay 16

A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions

arXiv:2605.14857v1 Announce Type: new Abstract: Harmonized System (HS) tariff classification is a high-stakes, expert-level task in which a free-form product description must be mapped to a specific six- or eight-digit code under the General Interpretive Rules (GIR), section notes, chapter notes, an

QWQW2 models #tariff-classification #language-models #expert-systems Read on arxiv →

arxivMay 15

GradShield: Alignment Preserving Finetuning

arXiv:2605.14194v1 Announce Type: new Abstract: Large Language Models (LLMs) pose a significant risk of safety misalignment after finetuning, as models can be compromised by both explicitly and implicitly harmful data. Even some seemingly benign data can inadvertently steer a model towards misaligne

#safety #finetuning #language-models Read on arxiv →

arxivMay 15bullish

A Large Language Model Based Pipeline for Review of Systems Entity Recognition from Clinical Notes

arXiv:2506.11067v3 Announce Type: replace Abstract: Objective: Develop a cost-effective, large language model (LLM)-based pipeline for automatically extracting Review of Systems (ROS) entities from clinical notes. Materials and Methods: The pipeline extracts ROS section from the clinical note using

MEGEMI4 models · +1 #healthcare #language-models #open-source Read on arxiv →

arxivMay 11

How Value Induction Reshapes LLM Behaviour

arXiv:2605.07925v1 Announce Type: new Abstract: Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility,

#language-models #value-induction #safety Read on arxiv →

arxivMay 11

Searching for Privacy Risks in LLM Agents via Simulation

arXiv:2508.10880v3 Announce Type: replace-cross Abstract: The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such

LL1 model #privacy #security #language-models Read on arxiv →

arxivMay 11bullish

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

arXiv:2605.06885v1 Announce Type: cross Abstract: Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregr

DIAU2 models #diffusion #language-models #representation-learning Read on arxiv →

arxivMay 8bullish

ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild

arXiv:2512.06721v2 Announce Type: replace Abstract: Recent studies have begun to explore proactive large language model (LLM) agents that provide unobtrusive assistance by automatically leveraging contextual information, such as in code editing and in-app suggestions. However, most focus on short, t

PR1 model #proactive-assistance #language-models #human-computer-interaction Read on arxiv →

arxivMay 8

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

arXiv:2605.06455v1 Announce Type: new Abstract: Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are bri

#language-models #monitoring #safety Read on arxiv →

arxivMay 8bullish

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

arXiv:2605.05927v1 Announce Type: new Abstract: Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech ge

TEWH2 models #speech-processing #language-models #modality-gap Read on arxiv →

arxivMay 7bullish

Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation

arXiv:2604.27201v2 Announce Type: replace Abstract: Hybrid-thinking language models expose explicit think and no-think modes, but current designs do not separate them cleanly. Even in no-think mode, models often emit long and self-reflective responses, causing reasoning leakage. Existing work reduce

PAQW2 models #language-models #architecture #hybrid-thinking Read on arxiv →

arxivMay 7

Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

arXiv:2605.00364v2 Announce Type: replace Abstract: Machine unlearning has emerged as a critical capability for addressing privacy, safety, and regulatory concerns in large language models (LLMs). Existing methods operate at the sequence level, applying uniform updates across all tokens despite only

LLTOWM3 models #machine-unlearning #language-models #privacy Read on arxiv →

arxivMay 6

E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems

arXiv:2605.00955v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) equips large language models (LLMs) with external evidence by retrieving documents at inference time, but it also turns the retrieval corpusinto a sensitive asset. Under a black-box setting, an adversary given a c

RE1 model #security #language-models #inference Read on arxiv →

arxivMay 5

Compute Optimal Tokenization

arXiv:2605.01188v1 Announce Type: new Abstract: Scaling laws enable the optimal selection of data amount and language model size, yet the impact of the data unit, the token, on this relationship remains underexplored. In this work, we systematically investigate how the information granularity of tok

BL1 model #tokenization #language-models #scaling-laws Read on arxiv →

arxivMay 5bearish

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

arXiv:2605.01224v1 Announce Type: new Abstract: This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because multili

LL1 model #nlp #multilingualism #language-models Read on arxiv →

arxivMay 5bullish

Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders

arXiv:2605.01372v1 Announce Type: new Abstract: Large language models (LLMs) have been widely explored for embedding generation. While recent studies show that in-context learning (ICL) effectively enhances the representational capability of LLMs by prepending a few task-related demonstrations, it c

#embedding #in-context-learning #language-models Read on arxiv →

arxivMay 1

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

arXiv:2604.27019v1 Announce Type: cross Abstract: Safety-aligned language models must refuse harmful requests without collapsing into broad over-refusal, but the training-time mechanisms behind this tradeoff remain unclear. Prior work characterizes refusal directions and jailbreak robustness, yet do

#safety #language-models #adversarial-training Read on arxiv →

arxivMay 1

Geometry-Calibrated Conformal Abstention for Language Models

arXiv:2604.27914v1 Announce Type: new Abstract: When language models lack relevant knowledge for a given query, they frequently generate plausible responses that can be hallucinations, rather than admitting being agnostic about the answer. Retraining models to reward admitting ignorance can lead to

#conformal-prediction #language-models #calibration Read on arxiv →

arxivMay 1

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

arXiv:2604.27996v1 Announce Type: new Abstract: This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction parad

#scientific-visualization #language-models #human-computer-interaction Read on arxiv →

arxivMay 1bullish

Proactive Dialogue Model with Intent Prediction

arXiv:2604.27379v1 Announce Type: new Abstract: Dialogue models are inherently reactive, responding to the current user turn without anticipating upcoming intents, which leads to redundant interactions in multi-intent settings. We address this limitation by introducing a lightweight intent-transitio

TE1 model #dialogue-systems #intent-recognition #language-models Read on arxiv →

arxivApr 30

Calibrated Surprise: An Information-Theoretic Account of Creative Quality

arXiv:2604.26269v1 Announce Type: cross Abstract: The essence of good creative writing is calibrated surprise: when constraints from all relevant dimensions act together, the feasible solution space collapses into a narrow region, and the surviving choices look least predictable from an unconstraine

LL1 model #creative-writing #language-models #evaluation Read on arxiv →

arxivApr 30

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

arXiv:2604.26148v1 Announce Type: cross Abstract: AI agents operating on user interfaces must understand how interfaces communicate state and feedback to act reliably. As a core communicative modality, animations are increasingly used in modern interfaces, serving critical functional purposes beyond

VIAN2 models #ui-interpretation #animation #human-computer-interaction Read on arxiv →

arxivApr 30bullish

A Dual-Task Paradigm to Investigate Sentence Comprehension Strategies in Language Models

arXiv:2604.26351v1 Announce Type: new Abstract: Language models (LMs) behave more like humans when their cognitive resources are restricted, particularly in predicting sentence processing costs such as reading times. However, it remains unclear whether such constraints similarly affect sentence comp

GPO3O43 models #language-models #cognitive-resources #sentence-comprehension Read on arxiv →

arxivApr 30bullish

Test-Time Safety Alignment

arXiv:2604.26167v1 Announce Type: cross Abstract: Recent work has shown that a model's input word embeddings can serve as effective control variables for steering its behavior toward outputs that satisfy desired properties. However, this has only been demonstrated for pretrained text-completion mode

#safety #language-models #optimization Read on arxiv →

arxivApr 30

Differentially-Private Text Rewriting reshapes Linguistic Style

arXiv:2604.26656v1 Announce Type: new Abstract: Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing form

#differential-privacy #language-models #text-rewriting Read on arxiv →

arxivApr 29bullish

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

arXiv:2604.24544v1 Announce Type: new Abstract: The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such datasets is challenging due to privacy concerns, re

LATG2 models #benchmark #evaluation #language-models Read on arxiv →