DataBubble·

Model Detail

whisper-large-v3

—

Provider: OpenAICategory: audioPipeline: automatic-speech-recognition

DB Score

2.4

Downloads

5.4M

Likes

Day

+0.0%

Week

+0.0%

Month

+1.1%

Overview

whisper-large-v3 is an audio model with 772M parameters released by OpenAI. The model is registered under the automatic-speech-recognition pipeline tag on Hugging Face, distributed under the permissive apache-2.0 license.

Technical

whisper-large-v3 ships with 772M parameters. Total weight footprint is approximately 1.5 GB, which is the relevant figure when planning local-inference VRAM. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Trending Signal

Downloads of whisper-large-v3 have moved +1.1% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

whisper-large-v3 is best fit for speech recognition, transcription, or speech synthesis depending on the task head. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2212.04356→

Model Info

Licenseapache-2.0

Citations7,066 (966 influential)

Recent newsView all news →

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a consumer smartphone.

arxiv2d ago

BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

arXiv:2606.03504v1 Announce Type: cross Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nasta

arxiv3d ago

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

arXiv:2601.19919v2 Announce Type: replace-cross Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused

arxivneutral14d ago

Quantizing Whisper-small: How design choices affect ASR performance

arXiv:2511.08093v2 Announce Type: replace-cross Abstract: Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantiza

arxivneutral15d ago

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

arXiv:2601.22569v2 Announce Type: replace-cross Abstract: Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secu

arxivneutral17d ago

Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework

arXiv:2605.18150v1 Announce Type: new Abstract: Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretr

Related Models

clip-vit-large-patch14

OpenAI · 33.1M downloads

clip-vit-base-patch32

OpenAI · 21.4M downloads

Kokoro-82M

hexgrad · 13.8M downloads

speaker-diarization-3.1

pyannote · 9.5M downloads