Model Detail
whisper-large-v3-turbo
▲ 1.1%whisper-large-v3-turbo is an audio model with 404M parameters released by OpenAI. The model is registered under the automatic-speech-recognition pipeline tag on Hugging Face, distributed under the permissive mit license.
whisper-large-v3-turbo ships with 404M parameters. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
Downloads of whisper-large-v3-turbo have moved +1.1% over the past 24 hours, +6.9% over the trailing seven days, +11.3% over the trailing thirty days. The trend is mildly positive, consistent with a model that is being picked up incrementally rather than going viral. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
whisper-large-v3-turbo is best fit for speech recognition, transcription, or speech synthesis depending on the task head. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning
arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a consumer smartphone.
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
arXiv:2606.03504v1 Announce Type: cross Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nasta
ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
arXiv:2601.19919v2 Announce Type: replace-cross Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused
Quantizing Whisper-small: How design choices affect ASR performance
arXiv:2511.08093v2 Announce Type: replace-cross Abstract: Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantiza
Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection
arXiv:2601.22569v2 Announce Type: replace-cross Abstract: Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secu
Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework
arXiv:2605.18150v1 Announce Type: new Abstract: Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretr