DataBubble·

Model Detail

multilingual-e5-large

—

Provider: intfloatCategory: llmPipeline: feature-extraction

DB Score

37.0

Downloads

5.0M

Likes

Day

+0.0%

Week

+0.0%

Month

+0.0%

Overview

multilingual-e5-large is a large language model with 280M parameters released by intfloat. The model is registered under the feature-extraction pipeline tag on Hugging Face, distributed under the permissive mit license.

Technical

multilingual-e5-large ships with 280M parameters. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Use Cases

multilingual-e5-large is best fit for general-purpose chat and instruction-following workloads, and semantic search, retrieval, and clustering pipelines. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2402.05672→

Model Info

Licensemit

Citations468 (38 influential)

Recent newsView all news →

MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

arXiv:2604.21370v2 Announce Type: replace Abstract: We present a systematic study of multilingual polarization detection across 22 languages for SemEval-2026 Task 9 (Subtask 1), contrasting multilingual generalists with language-specific specialists and hybrid ensembles. While a standard generalist

arxiv9h ago

Multilingual Sentence Embeddings for Linguistic-Integrated Reliability Audit

arXiv:2607.17466v1 Announce Type: cross Abstract: Multilingual assessment systems commonly rely on translation for scoring and quality-control processes. We evaluate whether multilingual sentence embeddings can replace translated English input for Linguistic-Integrated Reliability Auditing (LiRA) ac

arxiv9h ago

Overview of the NLPCC 2026 Shared Task 1: Difficulty-Aware Multilingual and Multimodal Medical Instructional Video Understanding Evaluation

arXiv:2607.06618v2 Announce Type: replace-cross Abstract: Following the CMIVQA, MMI-VQA, and M4IVQA challenges in NLPCC 2023--2025, we introduce the Difficulty-Aware Medical Instructional Video Question Answering (DA-MIVQA) shared task for NLPCC 2026. DA-MIVQA extends previous multilingual and multi

arxiv9h ago

BLAD: A Historically Contextualized, Multilingual Dataset of Bangladeshi Legal Acts (1799 to 2025)

arXiv:2607.17111v1 Announce Type: new Abstract: We present the Bangladesh Legal Acts Dataset (BLAD), a curated collection of 1{,}484 legislative acts enacted between 1799 and 2025. Each act is represented with its full text, structured sections and footnotes, repeal status, and metadata linking it t

arxiv9h ago

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

arXiv:2602.11961v3 Announce Type: replace Abstract: Open large language models (LLMs) have demonstrated improving multilingual capabilities in recent years. In this paper, we present a study of open LLMs for multilingual machine translation (MT) across a range of languages, and investigate the effec

arxiv9h ago

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

arXiv:2605.01224v2 Announce Type: replace Abstract: This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because mul

Related Models

multilingual-e5-small

intfloat · 10.0M downloads

bert-base-uncased

google-bert · 69.6M downloads

paraphrase-multilingual-MiniLM-L12-v2

SBERT · 48.6M downloads