Model Detail
clip-vit-base-patch32
—clip-vit-base-patch32 is an AI model released by OpenAI. The model is registered under the zero-shot-image-classification pipeline tag on Hugging Face.
clip-vit-base-patch32 is published on Hugging Face but our pipeline has not yet captured architecture, license, or parameter-count metadata for this entry. The data is refreshed daily, so these fields typically populate within 24–48 hours of release.
Downloads of clip-vit-base-patch32 have moved +4.5% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
clip-vit-base-patch32 is best fit for workloads that match the zero-shot-image-classification pipeline tag. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Brain-CLIPLM: Semantic Compression for EEG-to-Text Decoding
arXiv:2604.16370v2 Announce Type: replace Abstract: Decoding natural language from non-invasive electroencephalography (EEG) remains constrained by low signal-to-noise ratio and limited information bandwidth. This raises a central question: can sentence-level language be reliably recovered from such
DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum
arXiv:2606.05435v1 Announce Type: new Abstract: Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitati
Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
arXiv:2602.05657v2 Announce Type: replace Abstract: The study of tail behaviour of SGD-induced processes has been attracting a lot of interest, due to offering strong guarantees with respect to individual runs of an algorithm. While many works provide high-probability guarantees, quantifying the err
CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models
arXiv:2605.13178v2 Announce Type: replace-cross Abstract: In large vision-language models, visual tokens typically constitute the majority of input tokens, leading to substantial computational overhead. To address this, recent studies have explored pruning redundant or less informative visual tokens
Jailbreaking Multimodal Large Language Models using Multi-Clip Video
arXiv:2606.02111v1 Announce Type: cross Abstract: As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for malicious misuse. Prior jailbreak studies have shown that safety alignment in MLLMs can be bypassed through visual inpu
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
arXiv:2512.12997v2 Announce Type: replace-cross Abstract: CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Prior adversarial fine-tuning work primarily matches predicted logits between clean and adversarial examples, which overlooks uncertainty cali