arxiv1d agobullish

What Matters When Building Universal Multilingual Named Entity Recognition Models?

arXiv:2601.06347v2 Announce Type: replace Abstract: Recent progress in universal multilingual named entity recognition (NER) has been driven by multilingual transformer models, task-specific architectures, custom loss functions, and large-scale training datasets. However, despite substantial prior w

OT1 model #multilingual #ner #transformer Read on arxiv →

arxiv4d ago

CultureTalk-ID: A Multi-Task Dialogue Benchmark for Cultural Commonsense in Indonesian Local Languages

arXiv:2607.21016v1 Announce Type: new Abstract: Culture is lived through conversation, yet existing Indonesian cultural commonsense benchmarks evaluate LLMs on short and isolated prompts, stripping away the dialogic context in which cultural nuances actually surface. We introduce CultureTalk-ID, the

LL1 model #benchmark #cultural-commonsense #language-models Read on arxiv →

arxivJul 21bullish

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

arXiv:2602.11961v3 Announce Type: replace Abstract: Open large language models (LLMs) have demonstrated improving multilingual capabilities in recent years. In this paper, we present a study of open LLMs for multilingual machine translation (MT) across a range of languages, and investigate the effec

MISEHY7 models · +4 #multilingual #machine-translation #open-source Read on arxiv →

arxivJul 17bearish

MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark

arXiv:2607.00724v3 Announce Type: replace Abstract: Multilingual fluency often invites a stronger assumption: a model that can speak a user's language must also understand the culture encoded by that language. We call this the Illusion of Cultural Alignment. To test this assumption directly, we intr

LL1 model #multilingual #benchmark #cultural-alignment Read on arxiv →

arxivJun 29

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges-

MU1 model #multilingual #pretraining #language-models Read on arxiv →

arxivJun 17

A Recipe for Long-Context Reasoning in Large Language Models via On-Policy Optimization and Distillation

arXiv:2605.12227v2 Announce Type: replace Abstract: Existing approaches to post-train models for long-context tasks face complementary limitations: (i) supervised fine-tuning (SFT) provides stable supervision but suffers from exposure bias; (ii) reinforcement learning methods such as Group Relative

GRON2 models #long-context #reinforcement learning #distillation Read on arxiv →

arxivJun 11

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

arXiv:2606.11639v1 Announce Type: new Abstract: The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard graphem

WHZI2 models #speech recognition #bias #multilingual Read on arxiv →

arxivJun 11bullish

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

arXiv:2606.04694v2 Announce Type: replace Abstract: Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framewor

DU1 model #multilingual #distillation #language-models Read on arxiv →

arxivJun 4bullish

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

arXiv:2606.04694v1 Announce Type: new Abstract: Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework th

DU1 model #multilingual #distillation #language-models Read on arxiv →

arxivMay 16

PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts

arXiv:2605.14002v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) embedded in agentic frameworks have transformed information retrieval from static, long context question answering into open-ended exploration. Yet real world use requires models to discover and synthesize "long-tail" fact

#benchmark #information-retrieval #multilingual Read on arxiv →

arxivMay 5

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

arXiv:2605.00116v1 Announce Type: cross Abstract: In this article, we introduce ViLegalNLI, the first large-scale Vietnamese Natural Language Inference (NLI) dataset specifically constructed for the legal domain. The dataset consists of 42,012 premise-hypothesis pairs derived from official statutory

#nlp #dataset #legal Read on arxiv →

arxivMay 1

Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification

arXiv:2602.16516v2 Announce Type: replace Abstract: This paper introduces ParlaCAP, a large-scale dataset for analyzing parliamentary agenda setting across Europe, and proposes a cost-effective method for building domain-specific policy topic classifiers. Applying the Comparative Agendas Project (CA

PALA2 models #dataset #nlp #policy Read on arxiv →

arxivApr 27bullish

EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation

arXiv:2604.14306v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have demonstrated high proficiency on English-centric medical examinations, their performance often declines when faced with non-English languages and multimodal diagnostic tasks. This study protocol describ

LA1 model #multilingual #medical-ai #benchmark Read on arxiv →

arxivApr 21bullish

Multilingual Training and Evaluation Resources for Vision-Language Models

arXiv:2604.18347v1 Announce Type: new Abstract: Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for trainin

PIPICO3 models #multilingual #multimodal #benchmark Read on arxiv →

arxivApr 13bullish

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models

arXiv:2604.08970v1 Announce Type: cross Abstract: We study predictive multilingual evaluation: estimating how well a model will perform on a task in a target language when direct benchmark results are missing. This problem is common in multilingual deployment, where evaluation coverage is sparse and

LI1 model #multilingual #evaluation #benchmark Read on arxiv →

arxivApr 6

LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction

arXiv:2604.02866v1 Announce Type: new Abstract: Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous

MPQWQW5 models · +2 #knowledge graph #natural language #multilingual Read on arxiv →