Tag

#nlp

17 articles tagged #nlp

arxiv1d ago

Evaluation design conditions the expert-vs-auto MeSH gap: a controlled comparison of bag-of-words and BiomedBERT on the Cohen benchmark

arXiv:2607.21685v1 Announce Type: new Abstract: A systematic review begins with someone reading thousands of abstracts to identify the few that are relevant, and classifiers are used to prioritise that reading. Their inputs are often augmented with Medical Subject Headings (MeSH), assigned either by

DMBA2 models #benchmark #evaluation #classification Read on arxiv →

arxiv5d agobullish

Sentence Splitter: Uncovering Latent Factual Structure for Self-Supervised Learning

arXiv:2607.19845v1 Announce Type: cross Abstract: This paper introduces Sentence Splitter, a self-supervised framework built upon a T5-based encoder--decoder architecture for uncovering the latent factual structure of natural language sentences. The proposed method identifies the semantic boundary b

T51 model #nlp #self-supervised #knowledge-graph Read on arxiv →

arxivJul 16

The Capacity of Thought: Benchmarking Llama 3.2 in Semantic fMRI Neural Language Decoding and Improving the Huth Encoding-Model Baseline

arXiv:2607.12079v1 Announce Type: new Abstract: Decoding continuous language from fMRI signals remains a core challenge in non-invasive brain-computer interface research. We present two complementary investigations. First, we improve the Huth et al. ridge regression encoding pipeline through expande

GPGPDE3 models #nlp #brain-computer-interfaces #neural-decoding Read on arxiv →

arxivJun 25

Graph-Based Phonetic Error Correction of Noisy ASR

arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors a

G-GRMA4 models · +1 #asr #speech-recognition #nlp Read on arxiv →

arxivJun 25

Overview of HIPE-2026: Person-Place Relation Extraction from Multilingual Historical Texts

arXiv:2606.25935v1 Announce Type: new Abstract: Was this person ever at that place, and if so, when? Answering such questions from noisy, multilingual historical documents is the central challenge of HIPE-2026, the third edition of the HIPE evaluation series. Moving from named entity recognition and

#historical-documents #relation-extraction #evaluation Read on arxiv →

arxivJun 19

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

arXiv:2606.20369v1 Announce Type: new Abstract: Online hate speech and misinformation frequently overlap, yet NLP research has mainly treated them in isolation. While LLMs represent a scalable solution for assisting humans in the generation of counterspeech for both threats, zero-shot models frequen

LLRA2 models #nlp #counterspeech #misinformation Read on arxiv →

arxivMay 29

Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation

arXiv:2605.26428v2 Announce Type: replace Abstract: Generating high-quality, pedagogically useful questions from lecture slide decks is difficult because important instructional content is distributed across both text and visual elements, and because useful questions must be scaffolded across the fl

#education #nlp #question-generation Read on arxiv →

arxivMay 26

Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders

arXiv:2605.24996v1 Announce Type: new Abstract: Cognitive distortions, distorted patterns of thinking, have been increasingly studied in computational mental health research. Although they are related to many, if not all, mental health disorders, most existing studies focus primarily on depression.

TR1 model #mental-health #research #nlp Read on arxiv →

arxivMay 21

Assessing socio-economic climate impacts from text data

arXiv:2605.20793v1 Announce Type: new Abstract: Recent advances in natural language processing (NLP) and large language models (LLMs) have enabled the systematic use of large-scale textual data from news, social media, and reports to create datasets with socio-economic impacts of climate hazards suc

#nlp #climate #disaster-risk Read on arxiv →

arxivMay 5

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

arXiv:2605.00116v1 Announce Type: cross Abstract: In this article, we introduce ViLegalNLI, the first large-scale Vietnamese Natural Language Inference (NLI) dataset specifically constructed for the legal domain. The dataset consists of 42,012 premise-hypothesis pairs derived from official statutory

#nlp #dataset #legal Read on arxiv →

arxivMay 5bearish

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

arXiv:2605.01224v1 Announce Type: new Abstract: This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because multili

LL1 model #nlp #multilingualism #language-models Read on arxiv →

arxivMay 1

Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification

arXiv:2602.16516v2 Announce Type: replace Abstract: This paper introduces ParlaCAP, a large-scale dataset for analyzing parliamentary agenda setting across Europe, and proposes a cost-effective method for building domain-specific policy topic classifiers. Applying the Comparative Agendas Project (CA

PALA2 models #dataset #nlp #policy Read on arxiv →

arxivApr 28bullish

ComplianceNLP: Knowledge-Graph-Augmented RAG for Multi-Framework Regulatory Gap Detection

arXiv:2604.23585v1 Announce Type: new Abstract: Financial institutions must track over 60,000 regulatory events annually, overwhelming manual compliance teams; the industry has paid over USD 300 billion in fines and settlements since the 2008 financial crisis. We present ComplianceNLP, an end-to-end

COOPLE4 models · +1 #compliance #regulatory #nlp Read on arxiv →

arxivApr 23

Structured Disagreement in Health-Literacy Annotation: Epistemic Stability, Conceptual Difficulty, and Agreement-Stratified Inference

arXiv:2604.19943v1 Announce Type: new Abstract: Annotation pipelines in Natural Language Processing (NLP) commonly assume a single latent ground truth per instance and resolve disagreement through label aggregation. Perspectivist approaches challenge this view by treating disagreement as potentially

#nlp #annotation #health-literacy Read on arxiv →

arxivApr 22bullish

Model-Agnostic Meta Learning for Class Imbalance Adaptation

arXiv:2604.18759v1 Announce Type: new Abstract: Class imbalance is a widespread challenge in NLP tasks, significantly hindering robust performance across diverse domains and applications. We introduce Hardness-Aware Meta-Resample (HAMR), a unified framework that adaptively addresses both class imbal

HA1 model #nlp #class-imbalance #resampling Read on arxiv →

arxivApr 21

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

arXiv:2604.16042v2 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable AI largely focus on post-hoc explanation methods t

#explainability #nlp #research Read on arxiv →

arxivApr 4

A Dynamic Atlas of Persian Poetic Symbolism: Families, Fields, and the Historical Rewiring of Meaning

arXiv:2604.01467v1 Announce Type: new Abstract: Persian poetry is often remembered through recurrent symbols before it is remembered through plot. Wine vessels, gardens, flames, sacred titles, bodily beauty, and courtly names return across centuries, yet computational work still tends to flatten thi

#poetry #nlp #corpus-analysis Read on arxiv →