DataBubble·

Model Detail

clip-vit-large-patch14

—

Provider: OpenAICategory: otherPipeline: zero-shot-image-classification

DB Score

1.4

Downloads

33.1M

Likes

Day

+0.0%

Week

+0.0%

Month

+0.0%

Overview

clip-vit-large-patch14 is an AI model with 214M parameters released by OpenAI. The model is registered under the zero-shot-image-classification pipeline tag on Hugging Face.

Technical

clip-vit-large-patch14 ships with 214M parameters.

Use Cases

clip-vit-large-patch14 is best fit for workloads that match the zero-shot-image-classification pipeline tag. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2103.00020→

Model Info

Citations52,501 (9963 influential)

Recent newsView all news →

Brain-CLIPLM: Semantic Compression for EEG-to-Text Decoding

arXiv:2604.16370v3 Announce Type: replace-cross Abstract: Decoding natural language from non-invasive electroencephalography (EEG) remains constrained by low signal-to-noise ratio and limited information bandwidth. This raises a central question: can sentence-level language be reliably recovered fro

arxiv1d ago

Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation

arXiv:2502.06818v4 Announce Type: replace Abstract: Recent works modify CLIP to perform open-vocabulary semantic segmentation in a training-free manner (TF-OVSS). In vanilla CLIP, patch-wise image representations mainly encode homogeneous image-level properties, which hinders the application of CLIP

arxivneutral5d ago

The Hyperspherical Geometry of CLIP Latent Space: A Semantic Mixture Model

arXiv:2607.13660v1 Announce Type: new Abstract: Contrastive Language-Image Pretraining (CLIP) representations form a semantic embedding space governed by cosine similarity, reflecting an intrinsic hyperspherical geometry. However, existing probabilistic interpretations typically rely on Gaussian ass

arxiv7d ago

SynCLIP: Synonym-Coherent Language-Image Pretraining for Robust Open-Vocabulary Dense Perception

arXiv:2607.11008v1 Announce Type: cross Abstract: Open-vocabulary dense perception (OVDP) aims to localize objects unseen during training by leveraging textual knowledge. Despite the remarkable progress of recent CLIP-based approaches, we identify a critical limitation: synonym-induced grounding inc

arxiv7d ago

Beyond Euclidean Clipping: Overcoming Exploration Collapse in LLM RL via Riemannian Isometric Policy Optimization

arXiv:2607.10169v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a dominant paradigm for enhancing LLMs' reasoning capabilities. However, RL algorithms with PPO-Clip are inherently limited by exploration collapse. Subsequent works remain primarily heuristic and fail to identi

arxivneutral11d ago

Vanilla SGD with Momentum Survives Heavy-Tailed Noise: Convergence Analysis without Gradient Clipping or Normalization

arXiv:2607.08104v1 Announce Type: new Abstract: Stochastic gradient descent (SGD) is a cornerstone of modern optimization. While its performance under heavy-tailed noise is often addressed through specialized modifications such as gradient clipping or normalization, we investigate a more fundamental

Related Models

clip-vit-base-patch32

OpenAI · 21.4M downloads

whisper-large-v3-turbo

OpenAI · 8.0M downloads

clip-vit-large-patch14

OpenAI · 33.1M downloads

clip-vit-base-patch32

OpenAI · 21.4M downloads