Model Detail
multilingual-e5-large
—multilingual-e5-large is a large language model with 280M parameters released by intfloat. The model is registered under the feature-extraction pipeline tag on Hugging Face, distributed under the permissive mit license.
multilingual-e5-large ships with 280M parameters. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
multilingual-e5-large is best fit for general-purpose chat and instruction-following workloads, and semantic search, retrieval, and clustering pipelines. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026
arXiv:2606.04730v1 Announce Type: new Abstract: With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruc
LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment
arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton cat
Effective vocabulary expansion of multilingual language models for extremely low-resource languages
arXiv:2602.09388v2 Announce Type: replace Abstract: Multilingual pre-trained language models(mPLMs) offer significant benefits for many low-resource languages. To further expand the range of languages these models can support, many works focus on continued pre-training of these models. However, few
Language Bias under Conflicting Information in Multilingual LLMs
arXiv:2604.07123v2 Announce Type: replace Abstract: Large Language Models (LLMs) have been shown to contain biases in the process of integrating conflicting information when answering questions. Here we ask whether such biases also exist with respect to which language is used for each conflicting pi
Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models
arXiv:2606.03793v1 Announce Type: new Abstract: Multimodal Large Language Models integrate visual perception into language reasoning, introducing a continuous attack surface susceptible to adversarial attacks. Prior work on MLLM robustness has focused largely on English-centric tasks, leaving multil
Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility
arXiv:2606.03291v1 Announce Type: new Abstract: Large language models (LLMs) can memorize sensitive facts, motivating unlearning methods that remove targeted knowledge without costly retraining. However, unlearning research remains heavily English-centric. We study multilingual unlearning by extendi