arxivJul 20

How Much Human Label Variation Does Formal Semantic Structure Explain?: Group-Level Effects and Item-Level Ceilings in NLI

arXiv:2607.15870v1 Announce Type: new Abstract: Human label variation in natural language inference is increasingly treated as signal rather than noise, but how much of it formal semantic structure explains has not been measured directly. We measure it on the 3,113 SNLI and MNLI items of ChaosNLI, u

CHSNMN3 models #natural-language-processing #semantics

arxivJun 12

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

arXiv:2606.12476v1 Announce Type: cross Abstract: Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucin

#machine-learning #artificial-intelligence #natural-language-processing Read on arxiv →

arxivJun 12

Multiagent Protocols with Aggregated Confidence Signals

arXiv:2606.13591v1 Announce Type: new Abstract: Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent system. Prior work uses confidence wi

#multiagent-systems #natural-language-processing #confidence-estimation Read on arxiv →

arxivJun 5bullish

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

arXiv:2606.05557v1 Announce Type: new Abstract: A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AUR

AURE2 models #natural-language-processing #inference #benchmark Read on arxiv →

arxivMay 29bullish

Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective

arXiv:2605.29319v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong performance on table reasoning tasks but incur substantial inference cost due to long reasoning traces. Stepwise model routing mitigates this issue by dynamically assigning reasoning steps to smaller or larg

#table-reasoning #efficiency #routing Read on arxiv →

arxivMay 22bullish

Token-weighted Direct Preference Optimization with Attention

arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existi

LA1 model #optimization #language-models #reinforcement-learning Read on arxiv →

arxivMay 11bullish

End-to-end PDDL Planning with Hardcoded and Dynamic Agents

arXiv:2512.09629v2 Announce Type: replace Abstract: We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem

OPGPGP5 models · +2 #planning #natural-language-processing #large-language-models Read on arxiv →

arxivMay 1bullish

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

arXiv:2604.28028v1 Announce Type: cross Abstract: Large language models (LLMs) have revolutionized Text-to-SQL generation, allowing users to query structured data using natural language with growing ease. Yet, real-world deployment remains challenging, especially in complex or unseen schemas, due to

LA1 model #text-to-sql #natural-language-processing #database Read on arxiv →

arxivApr 30

Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection

arXiv:2604.26310v1 Announce Type: new Abstract: Fine-grained emotion classification, which identifies specific emotional states such as happiness, anger, sadness, and fear, remains a challenging task in natural language processing. This study benchmarks classical machine learning and deep learning a

LOMUSU6 models · +3 #emotion-classification #natural-language-processing #deep-learning Read on arxiv →

arxivApr 24bullish

GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

arXiv:2604.21649v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based appro

#knowledge-graph #natural-language-processing #quantization Read on arxiv →

arxivApr 24

UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

arXiv:2604.21534v1 Announce Type: new Abstract: This paper presents our system developed for SemEval-2026 Task 2. The task requires modeling both current affect and short-term affective change in chronologically ordered user-generated texts. We explore three complementary approaches: (1) LLM prompti

LL1 model #semeval #affective-computing #natural-language-processing Read on arxiv →

arxivApr 21bullish

ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts

arXiv:2604.17648v1 Announce Type: new Abstract: Summarizing deeply nested discussion threads requires handling interleaved replies, quotes, and overlapping topics, which standard LLM summarizers struggle to capture reliably. We introduce ThreadSumm, a multi-stage LLM framework that treats thread sum

THLL2 models #summarization #llm #discussion-threads Read on arxiv →

arxivApr 17

Rhetorical Questions in LLM Representations: A Linear Probing Study

arXiv:2604.14128v1 Announce Type: cross Abstract: Rhetorical questions are asked not to seek information but to persuade or signal stance. How large language models internally represent them remains unclear. We analyze rhetorical questions in LLM representations using linear probes on two social-med

LL1 model #language-models #rhetorical-questions #natural-language-processing Read on arxiv →

arxivApr 16bullish

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

arXiv:2604.12663v1 Announce Type: new Abstract: Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Mode

GCLL2 models #topic-modeling #natural-language-processing #artificial-intelligence Read on arxiv →

arxivApr 10bullish

A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM

arXiv:2604.06666v1 Announce Type: cross Abstract: Explainable fake news detection aims to assess the veracity of news claims while providing human-friendly explanations. Existing methods incorporating investigative journalism are often inefficient and struggle with breaking news. Recent advances in

LAGRRE3 models #explainability #fake-news-detection #natural-language-processing Read on arxiv →

arxivApr 8

Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system

arXiv:2604.05536v1 Announce Type: cross Abstract: Natural language is a complex system that exhibits robust statistical regularities. Here, we represent text as a trajectory in a high-dimensional embedding space generated by transformer-based language models, and quantify scale-dependent fluctuation

TR1 model #language-models #natural-language-processing #complex-systems Read on arxiv →

arxivApr 4

Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text

arXiv:2604.01745v1 Announce Type: new Abstract: Toxic content detection in online communication remains a significant challenge, with current solutions often inadvertently blocking valuable information, including medical terms and text related to minority groups. This paper presents a more nu-anced

BE1 model #toxicity-detection #natural-language-processing #content-moderation Read on arxiv →