·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Startup Battlefield 200 applications officially close in 3 days1h◆Google will pay SpaceX $920M per month for compute3h◆The most interesting startups right now want to get you off your phone4h◆This is your laptop… on AI5h◆New York lawmakers pass one-year ban on new data centers6h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs7h◆The latest AI news we announced in May 20267h◆The ‘together tech’ wave might be the most intriguing startup bet of 20267h◆This AI startup says it can tell if a script will make a hit film8h◆AirTrunk commits $30B to build 5GW of AI data centers in India8h◆The Meta hack shows there’s more to AI security than Mythos12h◆Mira Murati steps back into the spotlight, carefully16h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning17h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning17h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models17h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents17h◆Why Muon Outperforms Adam: A Curvature Perspective17h◆Vision Hopfield Memory Networks17h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies17h◆FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment17h◆Startup Battlefield 200 applications officially close in 3 days1h◆Google will pay SpaceX $920M per month for compute3h◆The most interesting startups right now want to get you off your phone4h◆This is your laptop… on AI5h◆New York lawmakers pass one-year ban on new data centers6h◆The token bill comes due: Inside the industry scramble to manage AI’s runaway costs7h◆The latest AI news we announced in May 20267h◆The ‘together tech’ wave might be the most intriguing startup bet of 20267h◆This AI startup says it can tell if a script will make a hit film8h◆AirTrunk commits $30B to build 5GW of AI data centers in India8h◆The Meta hack shows there’s more to AI security than Mythos12h◆Mira Murati steps back into the spotlight, carefully16h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning17h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning17h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models17h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents17h◆Why Muon Outperforms Adam: A Curvature Perspective17h◆Vision Hopfield Memory Networks17h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies17h◆FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment17h◆
DataBubble·

Model Detail

intfloat logo

multilingual-e5-large

—
Provider: intfloatCategory: llmPipeline: feature-extraction
DB Score
0.6
Downloads
5.0M
Likes
1K
Day
+0.0%
Week
+0.0%
Month
+0.0%
Overview

multilingual-e5-large is a large language model with 280M parameters released by intfloat. The model is registered under the feature-extraction pipeline tag on Hugging Face, distributed under the permissive mit license.

Technical

multilingual-e5-large ships with 280M parameters. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Use Cases

multilingual-e5-large is best fit for general-purpose chat and instruction-following workloads, and semantic search, retrieval, and clustering pipelines. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History
Research Paper
arXiv: 2402.05672→
Model Info
Licensemit
Citations434 (32 influential)
Recent newsView all news →
Related News
arxiv1d ago

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

arXiv:2606.04730v1 Announce Type: new Abstract: With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruc

arxiv1d ago

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton cat

arxiv1d ago

Effective vocabulary expansion of multilingual language models for extremely low-resource languages

arXiv:2602.09388v2 Announce Type: replace Abstract: Multilingual pre-trained language models(mPLMs) offer significant benefits for many low-resource languages. To further expand the range of languages these models can support, many works focus on continued pre-training of these models. However, few

arxivneutral2d ago

Language Bias under Conflicting Information in Multilingual LLMs

arXiv:2604.07123v2 Announce Type: replace Abstract: Large Language Models (LLMs) have been shown to contain biases in the process of integrating conflicting information when answering questions. Here we ask whether such biases also exist with respect to which language is used for each conflicting pi

arxivneutral2d ago

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

arXiv:2606.03793v1 Announce Type: new Abstract: Multimodal Large Language Models integrate visual perception into language reasoning, introducing a continuous attack surface susceptible to adversarial attacks. Prior work on MLLM robustness has focused largely on English-centric tasks, leaving multil

arxiv2d ago

Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility

arXiv:2606.03291v1 Announce Type: new Abstract: Large language models (LLMs) can memorize sensitive facts, motivating unlearning methods that remove targeted knowledge without costly retraining. However, unlearning research remains heavily English-centric. We study multilingual unlearning by extendi

Related Models
google-bert logo
bert-base-uncased
google-bert · 69.6M downloads
sentence-transformers logo
paraphrase-multilingual-MiniLM-L12-v2
SBERT · 50.3M downloads
HomeModelsNews