Model Detail
whisper-large-v3
—whisper-large-v3 is an audio model with 772M parameters released by OpenAI. The model is registered under the automatic-speech-recognition pipeline tag on Hugging Face, distributed under the permissive apache-2.0 license.
whisper-large-v3 ships with 772M parameters. Total weight footprint is approximately 1.5 GB, which is the relevant figure when planning local-inference VRAM. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
Downloads of whisper-large-v3 have moved +1.1% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
whisper-large-v3 is best fit for speech recognition, transcription, or speech synthesis depending on the task head. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning
arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a consumer smartphone.
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
arXiv:2606.03504v1 Announce Type: cross Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nasta
ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
arXiv:2601.19919v2 Announce Type: replace-cross Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused
Quantizing Whisper-small: How design choices affect ASR performance
arXiv:2511.08093v2 Announce Type: replace-cross Abstract: Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantiza
Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection
arXiv:2601.22569v2 Announce Type: replace-cross Abstract: Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secu
Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework
arXiv:2605.18150v1 Announce Type: new Abstract: Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretr