Model Detail
whisper-large-v3
▲ 0.2%Fine-tuning Whisper for Pashto ASR: strategies and scale
arXiv:2604.06507v1 Announce Type: new Abstract: Pashto is absent from Whisper's pre-training corpus despite being one of CommonVoice's largest language collections, leaving off-the-shelf models unusable: all Whisper sizes output Arabic, Dari, or Urdu script on Pashto audio, achieving word error rate
Languages in Whisper-Style Speech Encoders Align Both Phonetically and Semantically
arXiv:2505.19606v2 Announce Type: replace Abstract: Cross-lingual alignment in pretrained language models enables knowledge transfer across languages. Similar alignment has been reported in Whisper-style speech encoders, based on spoken translation retrieval using representational similarity. Howeve
WhisperRT -- Turning Whisper into a Causal Streaming Model
arXiv:2508.12301v2 Announce Type: replace-cross Abstract: Automatic Speech Recognition (ASR) has seen remarkable progress, with models like OpenAI Whisper and NVIDIA Canary achieving state-of-the-art (SOTA) performance in offline transcription. However, these models are not designed for streaming (o
On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR
arXiv:2603.27981v1 Announce Type: new Abstract: Automatic speech recognition (ASR) has advanced rapidly in recent years, driven by large-scale pretrained models and end-to-end architectures such as SLAM-ASR. A key component of SLAM-ASR systems is the Whisper speech encoder, which provides robust aco